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ABSTRACT 

The design of a failure detection and identification (FDI) system consists 
of designing a robust residual-generation process and a high-performance decision- 
making process. In this research the design of these two processes were 
examined separately. 

Residual-generation is based on analytical redundancy. Redundancy relations 
that are insensitive to modelling errors and noise effects are important for 
designing robust residual -generation processes. The characterization of the 
concept of analytical redundancy in terms of a generalized paurity space, as 
presented in this thesis, provided a fraunework in which a systematic approach to 
the determination of robust redundancy relations was developed. 

The Bayesian approach was adopted for the design of high-performance 
decision processes. The FDI decision problem was formulated as a Bayes sequential 
decision problem. Since the optimal decision rule is incomputable, a methodo- 
logy for designing suboptimal rules was proposed. A numerical algorithm was developed 
to facilitate the design and performance evaluation of suboptimal rules. This 
design approach was applied to an example, and the results were compared with 


those of Monte Carlo simulations. 
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CHAPTER 1 
INTRODUCTION 

Physical systems are often s\ibjected to unexpected changes, such as 
component failures and variations in operating conditions, that tend to 
degrade overall system performance. We will refer to such changes as fail- 
ures, although there may not be any physical failure present. Maintaining 
a certain level of performance under failure is the objective of reliable 
system designs. In some cases, it is possible to design a systm that is 
relatively insensitive to certain failures without explicitly detecting than. 
However, the inevitable tradeoff is reduced effectiveness of the system 
during normal conditions. Therefore, explicit failure detection and accom- 
modation may be more desiraJsle if such degraded overall performance must be 
avoided. Another situation where explicit failure detection and identifica- 
tion is required is one when an appropriate back-up actuator or sensor needs 
to be activated to replace the faulty one. Here, one needs to know which 
instrument should be used. Although failure detection and acco mm o d ation re- 
present a single objective, it is often reasonable to assume that the appro- 
priate remedy for each possible failure is known. From this perspective, 
the detection aujd identification of failures can be treated as a separate 
problem 2 U)d this is the subject of this thesis research. 

1.1 Problem Description 

The study of failure detection ^md identification (FDI) in dynamical 
systems is based on the analysis of the structure and behavior of systems, 
which are described by mathematical models. In this research, we are 
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SMinly concerned with the linear, time-invariant stochastic, discrete 
time model: 

q 

x(k+l) - Ax(k) + I b.u (k) +C(k) (1-11 

j-1 ^ J 

y^Ot) “ CjX(k) +Tij(k), j=l,...,m (1-2) 

where x is the n-dimi_nsional state vector, u, ,...,u are the q known 

1 q 

actuator inputs, and y. ,>>*,y are the m sensor outputs (measurements); 

1 m 

^ and n are independent zero mean, white Gaussian (noise) sequences with 
covariance 


E{C(k)^'(t)} - 

E{r)(k)r)’ (t)} - r6. ^ 
k,t 


where 6, ^ is the Kronecker delta. The column vector b. corresponds to the 
k,t j ^ 

j-th actuator and input u^, and the row vector corresponds to the j-th 
sensor. Equations (1-1) and (1-2) cure used to model a dynamical system in 
the normal mode, i.e. in the no-fail situation. 

Failures represent abrupt changes. Hence, various failure modes (fail- 
ure types) can be modelled as deviations from the normal mode. A faulty 
sensor may take the form of a change in c^, a bias, or increase 1 measurement 
noise in (1-2). A malfunctioning actuator may manifest itself as a shift in 

b , ^uld an actuator "stuck" at a cert' n position that causes an input bias 
J 
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may be described by a bias in (1-1) . In some applications, the linear 
model (1-1)- (1-2) is used to represent the linearised behwior of a nonlinear 
system at a particular operating point. A change in the set point can 
result in a different set of system matrices, i.e. A, {b^}, and {c^}. Thus, 
shifts in all the system matrices are often necessary in order to model such 
a change. 

Each failure is characterized by three attributes: 1) the failure 

mode or failure type (i) , e.g. a biased sensor or a "stuc)('' actuator,. 

2) the failure time (T)-the time at which the failure occurs, and 3} the 
magnitude (extent) of the failure (V), e.g. the size of a sensor bias. 

By the very nature of a failure, these attributes are not )tnown. Depending 
on the situation, not all three attributes are of equal importance. Consider, 
for instance, the problem of n failed sensor, with the availability of 
bac)(-up sensors, being able to identify the failure mode (failed sensor) 
may provide acceptable overall performance. However, if we want to compen- 
sate an estimate of x based on, for example, the Kalman filter (KF) [1] for 
the error due to a failed sensor, wc need to identify the failure time and 
the failure magnitude as well as the failure mode. When bac)c-up sensors are 
not avaJlable, we have to ma)ce use of a degraded sensor. Then, we need to 
determine both the failure mode and the failure magnitude (e.g. the size of 
the bias that has developed in the sensor). However, iv is scanetim's neces- 
sary to estimate both T and v in order to do a good job of identifying i, 
even when the failure mode is the only important parameter. This is analogous 
to the problem of estimating a subset of the state variables of a system. 
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In order to obtain accurate estimates of these variables, it is sometimes 
necessajry to use a full order filter to estimate the entire state vector. 

In any case, the detection of a failure (i,T,v) requires the examination of 
the measurement for the failure's characteristic effects. 

In recent yeeurs, numerous approaches (e.g. the voting scheme [2]C3], 
the generalized likelihood ratio (GLR) method [4] [5] , the multiple model 
method [5] [6] and the detection filters of Beard [7] and Jones [8]) have 
been developed to perform FDI in dynamical systems with linear stochastic 
models. A comprehensive survey that includes a description of the underlying 
principles and a discussion of the advantages and shortcomings of the various 
methods has been prepared by Willsky [9] . With such a wealth of background 
information available we shall forgo a detailed review of previous work in 
FDI. Instead, we proceed to the basic structure of a FDI system and the 
issues that require careful consideration during the design of such a system. 

The FDI process can be thought of conceptually as consisting of two 
stages: residual-generation and decision-making. For a particular set of 

hypothesized failures, a general FDI system has the basic structure shown in 
Figure 1-1. Outputs from sensors are initially processed to enhance possible 
hypothesized failure effects so that they can be easily recognized. The 
processed measurements are called the residuals, «md this enhanced effect of 
a failure is called the signature of the failure. Intuitively, the residuals 
represent the difference between the observed sensor outputs and the expected 
sensor outputs in the normal mode. In the absence of a failure, the 
residuals should be xinbiased, showing agreement between observed and expected 
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normal behavior of sensor outputs, and a failure signature often takes the 
form of residual biases that are characteristic of the failure. The residual- 
generation process can be of varying degrees of complexity for different types 
of FDI systems. For example, in a voting system, the residuals are simply 
the differences of the outputs of the various pairs of like sensors, whereas 
in the GLR system, the resid\ials (which are also the filter residuals) are 
generated by the more complex KF. 

In the decision process the residuals are examined for the presence of 
failure signatures. Decision functions or statistics are first calculated 
using the residuals. Then, a decision rule is applied to the decision sta- 
tistics to determine if any failure has occurred in the system. A decision 
process may consist of a simple threshold test on the instantaneous values 
or moving averages of the residuals, or it may be based on more sophisticated 
from statistical decision theory e.g. the sequential probability ratio test 
(SPRT) [10]. 

The design of a FDI system requires the consideration of several issues. 
The immediate concern is the performance of the detection system, i.e. how 
responsive the system is to failures and how accurate the decisions are. 
Unfortunately, systems that respond quickly to abrupt changes are necessarily 
sensitive to noise effects. Thus, a tradeoff exists between detection speed 
and detection accuracy, in addition, the detection probad>ilities , i.e. the 
prob«d)ilities of correct detections and cross-detections (declauring one type 
of failure, when, in fact, another has occurred) cannot be arbitrarily 
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specified as puameters of the design. They represent additional tradeoffs 
inherent in the FDI design problem. These performance tradeoff issues can 
be most directly considered in the design of the decision process of the FDI 
system rather than the residual generation process. One of the goals of this 
research is to develop an approach for designing decision processes that 
systematically exaunines the tradeoffs among the various performance issues. 

A desirable and important quality for a practical FDI system to possess 
is robustness, i.e. the relative insensitivity of the system's performance 
to peurameter variations and modelling errors or uncertainties. An ideal ap- 
proach to designing a robust system is to include all uncertainties in the 
problem specification, and a robust design will result from optimizing (in 
some sense) the performamce of the system with the uncertainties. However, 
this generally leads to a complex mathematical problem that is too difficult 
to solve from a practical point of view. At the other extreme, a sisqpler 
etltemative approach is to ignore all modelling uncertainties in the perfor- 
mance optimization process. The resulting design is then evaluated in the 
presence of modelling errors. If the degradation in performamce is toler- 
able, the design is accepted, otherwise, it is modified and re-evaluated. 
Although this iterative method often yields acceptable designs, it has several 
serious drawbacks. Since the effects of the uncertainties cire not directly 
determined, it is often unclear what parts of the design should be modified 
emd what form the modifications should take. Furthermore, each iteration 
may b^ very expensive to carry out since extensive Monte Carlo simulations are 
often required for perfonocmce evaluation. 
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A better approach that considers the possible modelling errors directly 
is suggested in the work of Deckert, et.al. on aircraft sensor FDI problems 
[11] . The basic idea there is to identify the parts of the system that axe 
known well 2 uid those that may contain substantial uncertainties. Then a FDI 
system (i.e. its residual-generation stage) is designed based primarily on 
the well-known parts (ouid only secondarily on the less well-known paurts) of 
the system behavior. For example, the velocity and acceleration of an air- 
craft are related in two ways. Aerodynamic forces that give rise to the 
aircraft's acceleration are functions of the velocity (and other variables). 
Ho^vever, this function relating velocity and acceleration is only known em- 
pirically and can be rather inaccurate. On the other hand, the kinematical 
relationship between velocity and acceleration is governed by a well-known 
physical relationship, v=a. Therefore, the perfor^^ce of a design based on 
the kinematical relationship is insensitive to system parameter variations, 
while a design based on the aerodynamics is sensitive to such variations. 

Because, modelling errors affect the residual-generation process directly, 
the above approacli suggests that robustness can effectively be achieved by 
designing a robust residual-generation process. We will adopt this approach 
to the robustness issue, as it is much simpler than the ideal approach (of 
an integrated treatment of robustness in the residual- and decision-making 
systems) and more direct than the trial -and-error method. Consequently, it 
will yield more insight into the general problem of robust FDI system design. 
In addition, this approach will provide the designer with a qualitative 
measure of the attainable level of robustness in the early stages of this 
design, and this will allow him to assess what be can expect in terms of 


overall performance. 
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Computational complexity is another important design consideration. 

Clearly, a practical system should only require a reasonable amount of 
storage and computation. An FDZ system that take into account detailed 
dynamical behavior of the syston is more cc»iq)lex but is likely to be more 
effective for a greater variety of failures than a system that does not use 
the same information. Furthermore, it may permit a reduction in hardware 
redundancy. In this study, the tradeoff effects eunong complexity, performance, 
and possible hudware redundancy are considered in the design of both the 
residual-generation emd decision processes. 

The goal of this research is to develop a methodology for designing 
FDI systems that takes into consideration the issues of performance, robustness, 
and computational complexity. Viewing the FDI process as consisting of two 
stages allows us to break up the FDI system design problem into two parts. 

We will examine the design of robust residual-generation processes and the 
design of high-performance decision-making proceeds separately. 

1.2 Overview of Thesis 

This thesis report basically consists of two parts, each dealing with 
one of the two stages of the FDI process. In Chapter 2 and 3 we will con- 
sider the design of residual-generation processes. The decision rule 
(decision process) design problem is the subject of Chapters 4,5, and 6. 

All residual-generation processes exploit some form of analytical 


redundancy - the relationship among sensor outputs and actuator inputs 
specified by the dynamics of the system under the no-fail situation, e.g. 
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the kinematic relation v«a. (when such a relation is violated, a failure 
among the components, i.e. sensors and actuators, involved in the relation 
must have occurred) . In order to facilitate the design of robust residual- 
generation processes, a thorough understanding of the concept of analytical 
redundancy and how it cam be exploited in deriving residuals is needed. In 
Chapter 2 we will present a characterization of analytical redundancy (for a 
linear time-invariant system in the absence of noise and modelling uncertain- 
ties) in terms of the concept of a parity space . We will describe several 
forms of residual-generation that are based on analytical redundancy, and we 
will discuss how such residuals can be used for FDI. 

In Chapter 3 we will consider the effect of modelling errors and noise 
on redundancy relations, and we will define a simple measure of such effects. 
Clearly, a residual-generation process is robust (or as robust as it can be) 
if it is based on the redundancy relation that is least vulnerable to noise 
and modelling eirrors. The choice of such a redundancy relation is formulated 
as a minimax optimization problem (aimed at minimizing the worst case effect 
of noise ajid modelling error) . Together with the viewing of analytical 
redundancy in terms of a parity space, the minimax design represent a new ap- 
proach to the problem of designing robust residual-generation processes. 

The design of a decision process involves resolving the tradeoff among 
detection performance issues such as expected detection delay, false alarm 
rates, and the various detection probabilities. We have chosen to examine 
this problem using the Bayesian approach with which the design problem can be 
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easily conceptualized, in Chapter 4, we will describe the Bayes formulation 
of the FOX decision problem and the optimal Bayes decision rule. Although the 
optimal rule is generally not computable, the structure of the Bayesian ap- 
proach can be used to derive practical suboptimal rules. We will consider the 
design of suboptimal rules based on the Bayes formulation in Chapter 5. 
Numerical algorithms for designing such rules and evaluating the associated 
performance indices (detection probabilities, etc.} will also be presented. 

In Chapter 6, we will report on our experience with this approach to designing 
decision rules through a numerical example and simulation. 

A brief sunmary of this thesis and a discussion of some futxure reseaurch 
directions are included in Chapter 7. 
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CHAPTER 2 

ANALYTICAL REDUNDANCY AND RESIDUAL GEWERATICW 

2.1 Introduction 

Hie first stage of the FDI process is the generation of residuals, 
and one of the goals of this research is to investigate the problem of 
designing simple and robust residual-generation processes. To date, this 
problem has not been dealt with directly for the general case, although it 
was successfully resolved for a particular application 111 ]. The present 
chapter and the next one are devoted to developing one approach to this 
general design problem. 

The first step towards our goal is to gain a better understanding 
of the concept of analytical redundancy - the basis for residual-generation. 
There are basically two forms of analytical redundancy: 1) direct redundancy- 

the instantaneous relationship among outputs of sensors, and 2) temporal 
redundemey - the relationship among the histories of sensor outputs and 
actuator inputs. Before we proceed to present a mathematical characterization 
of redundancy we will describe some examples of the two forms of redundancy. 

Direct redundancy exists among sensors whose outputs are algebraically 
related, i.e. the sensor outputs ere related in such a way that the variable 
one sensor measures can be determined by the instantaneous outputs of the 
other sensors. A simple example is the case of identical sensors, where 
we luive, in the absence of sensor noise, 


( 2 - 1 ) 
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%e general form of direct redundancy exists among a set of sensors that 
are modelled by linearly dependent in equation (1-2), e.g. a set of 

four accelerometers measuring acceleration in 3-space 1 12 ] . In such a case 
some fixed linear combination of the sensor outputs should always be zero 
(or close to zero when noise effects are included) in the normal mode. 
Alternatively, the ideal output of one of those sensors can be generated by 
a linear combination of the outputs of the remaining sensors, i.e. 


y 


1 


m 


I 

i-2 


Vi 


( 2 - 2 ) 


where the are constants. It is clear that the identical sensor case (2-1) 

is a specialization of (2-2). in the absence of a failiire, the ideal output 

calculated in this way should agree with tho observed output of the sensor. 

m 

That is, the residual y. (k) - J a.y. (k) should be zero. A deviation from 

i=2 ^ ^ 

this behavior provides the clue to a failure among the set of sensors. We 
note that, through direct redundancy, certain dissimilu sensors may, in 
effect, be compared. 

Direct redundancy has been exploited to generate residuals for the voting 
scheme for sensor FDI, where the "majority rule" principle is applied to 
detect and identify the failed sensors. (We will discuss voting in Section 
2.3). Examples of successful application of the voting method include [ 2 ] 

[ 3 ][ 12). The residuals of the voting system singly consist of 

weighted sums of sets of linearly dependent (instantaneous) soru^or outputs. 
Thus, direct redundancy based residual-generation is simple. However it has 


- 22 - 


two major disadvantages. A high degree of hardware redundancy is required 
to use the "majority rule" principle. In addition^ direct redundancy is not 
applicable tor detecting actuator failures. 

In contrast to direct redundancy, temporal redundancy is useful for 
both sensor and actuator FDI. Consider, for example, the temporal relation- 
ship between velocity (v) and acceleration (a) ; 

v(k+l) * v(k) + Ta(k) , k=l,2,... (2-3) 

where T is the period of discretization. Just as direct redundancy (2-2) 
provides the basis for comparing outputs of linearly dependent sensors, 

(2-3) prescribes a way of comparing velocity measurements with accelerometer 
outputs, i.e. the residual is r(k+l) = v(k+l) - v(k) - Ta(k). As a result, 

outputs from velocity sensors and accelercxneters can be compared in a mixed 
velocity-acceleration sensor voting system for detecting and identifying 
both types of sensor failures. 

Temporal redundancy facilitates the comparison of sensor outputs among 
which direct redundancy does not exist, consequently, a reduction in Iiardware 
redundancy for sensor FDI can be realized. Viewed in a different light, the 
use of analytical redundancy implies that additional sensor failures can in 
principal be detected with the same level of hardware redundancy. 

To see how temporal redundancy can be exploited for detecting actuator 
failures, let us consider the first-order model of a vehicle in motion: 


v(k+l) ■ av(k) + Tu(k), 


k=l,2 


(2-4) 
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K^ere v denotes the vehicle's velocity, and a is a scalar constant betvfeen 
zero and one reflecting the effects of friction and drag. T is the dis- 
cretization step, atvl u is the connanded engine force (actuator input) 
divided by the vehicle's mass. Now, the velocity measurements can be com- 
pared to the actuator inputs by means of (2-4), i.e. 

r(k+l) « v(k+l) - ov(k) - Tu(k). An actuator failure may be inferred, if 
the sensor is functioning normally but (2-4) is not satisfied. 

While the additional information supplied by dissimilar sensor outputs 
and actuator inputs at diff. rent times through temporal redundancy facilitates 
the detection of a great Vt.riety of failures and reduces hardware redundancy, 
exploitation of this additional information often results in increased com- 
putational complexity, since the dynamics of the system will have to be 
accounted for. Depending on the accuracy of the system model, the biggest 
drawback, however, could be the increased sensitivity to system parameter 
variations due to the dependence on the system dynamics - the robustness 
issue. 

From the above discussion, one approach to the design of robust 
residual-generation processes in any given application is evident! the various 
redundancies that are relevant to the failures under consideration are to be 
identified, and residual-generation should be based on the redundancies that 
are least sensitive to modelling uncertainties. This is the approach we 
will examine. 

In order to apply this design philosophy, we need: 1) a precise 

characterization of analytical redundancy, and 2) a quantitative description 
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of the effects of noise emd modelling uncertainties on the generation of 
residuals. We will examine the first problem in this chapter and the second 
problem In Chapter 3. 

In tlie next section, we will present a general formulation of the 
con';<.ip: of analytical redundancy In linear time-invariant systems. This 
formulation Is a generalization of the parity equations studied by various 
researchers (e.g. [21 at»3 the parity space discussed by Potter and Susan 
[13], and It provides a unified setting for discussing all approaches to FDI. 
In Section 2.3 we will discuss a generalized voting scheme for FDl, where 
residual-generation is based on the explicit forms of analytical redundancy 
described In section 2.2. In section 2.4 we will exfdnlne the effects of 
failures on the residuals generated from these explicit forms of analytical 
redundancy In order to understand how such Information Is used to detect 
failures in FDI schemes other than the voting method. In FDI systems such as 
GUI [4] and the detection filters of Beard [7] and Jones [8], residual-genera- 
tion is acccsnplished by means of filters, which do not utilize analytical 
redundancy In as explicit a form as In a voting system. Based on the Insights 
obtained in section 2.4 we will explore the role of analytical redundancy In 
the residual-generation process of these systems in section 2.5. 

2.2 Analytical Redundancy - 4 arlty Relation 

In this thesis we have focur.sed our effort on developing an approach 
to designing robust residui^l-generation processes for linear time-invariant 
(LTI) systems. In order to focus on the concept of redundancy we will base 
our discussions in this chapter primarily on an LTI system that Is in a 


-25- 


noi*«-free enviionment and for which we have an exact model. We will 
analyze the effect of noise in subsequent chapters as we consider the use 
of redundancy for the design of high-performaiice FDI schemes. 

Tlie system of interest to us here is characterized by the deterministic 

model 


q 

x(k+l) - Ax(k) + I b u (k) (2-5) 

j-1 ^ J 


y^ (k) ■ c^x(k) , j“l» • . • 


(2-6) 


where x is the n-dimensional state vector, A is a constant nxn matrix, 
b^ is a constant (colunm) n-vector, and c^ is a constant (row) n-vector. 

The scalar Uj is the known input to the j-th actuator, and the scalar y^ is 
the output of the j-th sensor. 

In order to facilitate the following discussion we introduce the 
following notation: 




CjA 




n^ * 0,1,... 

j “ 1 , . « • ,m 


(2-7) 


The well-known Caley-Hamilton theorem [ is ) implies that there is an n^, 
l£n^£n, such tliat 


rank C . (n . ) » { 
j 3 


n ,+l 
3 

n ,<n . 
3 3 

n , 
3 

n ,>n . 
3- 3 


( 2 - 8 ) 


The matrix CjCn^-l) characterizes that part of the system that is observable 
from the j-th sensor. Specifically, the null space of CjCn^-l), W(Cj(n^-l)), 
is knovm as the unobservable subspace of the j-th sensor, because any component 
of the state lying in W(Cj(n-l)) will not affect the output of the j-th 
sensor [15]- The rows of Cj(n^-l) span a subspace of R*' that is the 
orthogonal complement of the unobservable subspace. Such a subspace is defined 
here to be the observable s ubspace of the j-th sensor , and it has dimension 
n^ . The system (2-5) -(2-6) is observable (through the m sensors) if the 
sum of the m observable subspaces is the whole space R^. We will assume that 
the system is observable. 

In Subsection 2.2.1 we will characterize analytical redundancy in terms 
of the concept of a parity space and parity relations, and in Subsection 2.2.2 
we will discuss residual-generation schemes based on analytical redundancy, 
i.e. on parity relations. 


2.2.1 The Generalized Parity Space 


m 


Let ( 1 ) be a row vector of dimension n = (n.+l) such that 

j=l ^ 

1 m j — 

...w ], where uj , j=),...,m, is a (n^+1) -dimensional row vector. 

Consider a non-zero u satisfying 


r i 

lU) , . . .Cl) J 


Cj(n^) ^ 


L c (n ) J 

m m 


X = 0, VX e R 


(2-9) 


(Note that in the above t-quation Cj(nJ has n^+l rows while it has only rank 
n.. The reason for this will become clear when we discuss the temporal 


-27- 


redundancy associated with a single sensor.) Since the system is observable, 
there are only n-n linearly independent w's that satisfy (2-9). We let Q 
be an (n-n)x n matrix with a set of such independent u)'s as its rows. (The 
matrix is not unique.) Assuming all the u^'s are zero for the moment, we 
have the n-n linearly t .ependent parity equations or parity relations 
that are independent of the state x: 




V'j^(lc,n^) 


y (k,n ) 
m m • 


= P 


( 2 - 10 ) 


where 


/j()c,nj = 


Yj^(k) 




j=l, . . . ,m 


( 2 - 11 ) 


The (n-n) -vector p is called the parity vector . Under the ideal conditions 
set forth in the beginning of this section, p is zero. More generally, in 
the presence of noise and failures, p is a non-zero vector representing the 
inconsistencies among the sensor outputs. Different failures will produce 
different p's. Thus, the parity vector may be used as the signature-carrying 
residual for FDl. We will further discuss residual-generation based on 
parity equations in the succeding sections. 

The space of all (n-n) -dimensional parity vectors defined by (2-10) is 
called a parity space . We note that the parity space discussed ^bove 


is an extension of the parity space examined by Potter and Suman [ 13 ] to 
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include sensor outputs at different times, thereby, taking into account 

the dynamics of the system. In [ 13 ] , Potter and Suman exclusively studied 

equation (2-10) with n =n = ... =*n =0. We will exploit our generalized 

12 m 

notion of a parity space to characterize analytical redundancy. 

When the actuator inputs are not zero, (2-10) must be nx>dified to be 




— “ 


[ V'^(k,n^) 



U(k,nQ)> 

1 y (k,n ) 


B (n ) 

' m m _ 


m m 

/ 


where 





c.B * 0 

3 


n .-1 

c .A ^ B . . . . 
3 


0 


c.B 0 . . . 
3 



( 2 - 12 ) 


(2-13) 


B = [b ...b ] (2-14) 

1 q 

n„ = max[n, ,...,n ] (2-15) 

0 1 m 

u(k) = [u (k) — u (k) ] ' (2-16) 

1 q 

U(k,nQ) = [u’(k)...u'(k+nQ)]' (2-17) 


8,(nJ is an (n.+l)x n.q matrix (q iS the number of actuators). Equation 
3 j 3 0 

(2-12) defines the generalized parity vector p, and the (n-n) -dimensional 
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space of all such vectors is called the generalized parity space . Any 
linear combination of the rows of the left hand side of (2-12) is called 
a parity function . Note that (2-8) implies that we generally do not need to 
consider a higher dimensional parity space that is defined by (2-12) with 
n^ replace by n^ > n^, j-=l,...,m, although it is possible to do so 

It is now clear that (2-12) with p=0 (which is the case under ideal 
conditions) characterizes all the analytical redundancies cf the LTI system 
(2-5) and (2-6) , because it specifies all the essential relationships among 
the actuator inputs and sensor outputs. Each parity equation can be regarded 
as a redundancy relation, and it can be obtained by taking a linear combination 
of the rows of (2-12) . 

An important notion in describing analytical redundancy is the order of 
a redundancy relation. Let u be the vector of a particular parity relation. 


x.e. 


m 


j=l 


U)-’[K(k,n J 


B. (n.)U(k,n„) ] = 0 
1 : 0 


(2-18) 


where co"*]= o). We can define the order p of such a relation as follows. 

Since some elements of o) may be zero, there is a largest index n such that 

the n-th element of for some j is non-zero but the (n+l)th through the 

n,-th elements of each iiP are zero (or n.<n.) Then, p is defined to be p=n-l. 
3 3- 

The order p describes the "memory span" of the redundancy relation. 

For exaunple, when p=0, instantaneous outputs of sensors are examined. When 
p>0, at least some sensor outputs at times up to p steps in the past need to 
be considered in the parity equation, e.g. the kinematical equation (2-3) 
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is a first order parity relation- Hence, direct redundancy is characterized 
by p=0, while a temporal redundancy relation has a p>0. 

Based on the properties of observability subspaces we have developed a 
general characterization of analytical redundancy in terms of a parity space 
To illustrate the generality of this characterize tion we will excunine a few 
examples of redundancy relations. 

Direct Redundancy 

Direct redundancy is described by a zeroth order parity relation 


[coj o...oi...lu)Q 0...0]< 


( V^(k,n^) 


Bi(ni) 

(/(k.n^) > 

( y (k,n ) 


B (n ) 

) 

L m m J 


L in m _ 



(2-19) 


where is a scalar denoting the (i+l)-st elements of At least two of 

the must be non-zero for (2-19) to be a meaningful parity relationship. 

Because of the structure of 8.(n ) (see (2-13)), (2-19) can be written as 

1 j 


r 1 


y^(k) 




= 0 


( 2 - 20 ) 


In this case the parity function (left hand side of (2-20)) can be directly 


used as the residual. 
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A single sensor 

Due to (2-8) (Caley-Hamilton) it is always possible to find a non-zero 
such that under the ideal condition, (2-12) becomes 


oj-'[y'j(k,n J - BAn^)lHk,n^)] = 0 


(2-21) 


Expression (2-21) represents a form of temporal redundancy - i.e. it is the 

relationship among the histories of the j-th sensor output and the actuator 

input, and it is of order n^. Note that (2-21) is the lowest order parity 

relation involving only one (the j-th) sensor, and this is why we have chosen 

to consider Cj(n^), as opposed to (n^-1) , in defining the generalized 

parity space. Parity relation (2-21) prescribes a consistency test that 

requires comparing to zero a linear combination of a window of sensor j outputs 

and the actuator inputs. Such a combination (the left hand side of (2-21)) 

can be used as the residual r(k) . Since this test involves only one sensor, 

it may be used as a self-test for sensor j, if 8,(n.)=0 or if the actuators 

12 

can be verified (by other means) to be functioning properly. Similarly, it 
can be used to detect actuator failures when sensor j can be verified to be 
normal. The Caley-Hamilton theorem implies that a self-test redundancy such 
as (2-21) always exists for sensor j. 

Equation (2-21) can be alternatively written as 


y. (k) = -(o)^ ) ^ 

^ n . 

D 


n , 


U) 

L n.-t 
1 


y . (k-t) 
1 


n . 

- 5 oi u(k- 


t) 


t=l n,-t 
2 


( 2 - 22 ) 
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t^ere 


[a^ ... 0...0] - 

nj-l 


(2-23) 


and <J^, t=0, . . . ,n .-1, i? a q-dimensional row vector; t»0,...,n. 

is the (t+1) -th component of oj^ , and u(k) is the q-dimensional actuator 

input vector at time k. Equation (2-22) represents an auto-regrersive 

moving-average (ARMA) nodel for the j-th sensor output. It is only a 

n . 

moving average (MA) if c^A ^=0. Under the ideal condition, the value of 
Yj at time k can be predicted from the past values of and actuator 
inputs using (2-22) . The residual defined by taking the difference between 
the left and right hand sides of (2-22) is indeed the difference between 
such a prediction of y^ (k) and the observed y^(k). Hence r a non-zero residual 
will provide the clue for a (sensor j or an actuator) failure. 


Tengx?ral redundancy between two sensors 

A tenoral redundancy exists between sensor i and sensor j, if there are 


i r i 

0) - t<*>o*”“fl^-l 


(2-24a) 


01 

satisfying the redundancy relation 


( y (k,n. )1 8 (n ) 

i j ' - ^ 

“ ^ ( [y^(k,n^)J [8j(n.) 


(((k.n^) I 


(2-24b) 


(2-25) 


Equation (2-25) is a special case of the general form of parity equation 
(2-18) with sfti, sji<j. The relation (2-25) is of order p< nax[n^rn^] . 

Clearly, (2-25) holds if 






(n^-1) 





(2-26) 


Now, the rows of Cj^(n^-l) and Cj(n^-l) span the observable subspaces of 
sensors i and j, respectively. Hence, (2-26) implies that a redundancy 
relation exists between two sensors if their observable subspaces overlap. 

/V A 

Furthermore, when the overlap subspace is of dimension n, there are n 
linearly independent pairs that will satisfy (2-26) . Therefore, 

there are as mauiy independent redundancy relations of the form (2-25) as the 
dimension of the overlap subspace. 

Because the order of the redundancy (2-25) is p, either U)^ or o)^ must 


be non-zero. Assuming 
model for y^: 


we can write (2-25) in the form of an AHMA 

P 


Fj (k) 




Lt-i 


„y. (k-t) + 

p-t-'p 


t-o 




t-i 


(a^ .+0^ )u(k-t) 

p-t p-t 


1(2-27) 


where we have used the notation (2-23) . (Note that the summation of the i-th 
sensor outputs ranges from 0 to p but those of the j-th sensor outputs and 
actuator inputs aure from 1 to p.) Note that, viewing (2-27) as an ARMA model 
for y^, we see that y^ plays the role of an input, just as do the actuator 
inputs u. 
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Similar to the single sensor ease, (2-27) Indicates that (k) can 
be predicted from a linear conbination of the i-th sensor outputs 
(y^(k-p) ,...,y^(k)), the j-th sensor outputs (y^ (k-p) , . . . ,y^ (k-1) ) , and the 
actuator inputs (u(k-p) , . . . ,u(k-l) } . The kinematical relation between 
velocity and acceleration measurements (2-3) is a parity relation expressed 
in the form of (2-27). The parity function (the left hand side of (2-25)), 
which represents the difference between the observed and predicted sensor 
outputs, can t>e directly used as the residual. 

In susmary, we have conceptualized the notion of analytical redundancy 
in terms of a generalized parity space. We have also illustrated how various 
redundancy relations can be obtained from this parity space and how these 
relations may be used in forming residuals. In the next subsection we will 
further discuss residual-generation based on parity relations. 

2.2.2 Residual Generation Based on Parity Relations 

In the preceding discussion we saw that parity functions can be used 
as residuals. These residuals may in turn be used in a voting syston for 
FDI. We will discuss the voting scheme in the next section. In the remainder 
of this section we will describe other methods for generating residuals based 
on parity relations (t«&poral redundancies) . We will mainly use the kinematical 
relation (2-3), tdiich is a first order parity relation involving tt#o sensors, 
to illustrate these mechanisms of residual -generation, but the basic concept 
can be readily generalized to higher order cases involving more sensors and 


actuators 
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For easy reference we re-write (2-3) here 

v(k+l) - v(k) + Ta(k) (2-28) 

A direct method for generating a residual r^(k) is as follows 

r^(k) - v(k) - v(k-l) - Ta(k-l) (2-29) 

This is an ex«if>le of the type of resid\tal described in the preceding sec- 
tion that involves direct calculation of a parity function as in equation 
(2-18). In a noisy environment, r^(k) is a random sequence. In the absence 
of a failure it is zero mean. When a failure occurs it Isecomes biased 
(possibly for only a short period of time as we shall see below) . It is by 
detecting the presence of the bias in x^k) that a failure can be inferred. 

Mow consider a velocity sensor failure that manifests itself as a constant 
bias in the v-measiiren^nt . Suppose this failure occurs at time T. Then 
r\k) will contain a bias at time T, but it will become zero mean ^or k>T. 

That is, the failure signature vanishes after one time step. (This is 
because the sensor bias effect is cancelled out via the term: v(k) - v(k-l) 
in (2-29)). Thus, if this failure is not detected at time T, r^(k) defined by 
(2-29) will not provide any clue of this failure after time T. 

Fortunately, another way of using the parity relation (2-29) to generate 
useful residiuls is available. The ARMA representation (2-29) of the 
klnMtatical relation is^lies that with an initial observation of the 
v-measurement, say v(0), v(k) , k>l,2,.. can be predicted using the accelera- 

1 


tion measurements only. Such a predicted v(k) can Ise stibtracted from the 
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^••rved v(k) to form the residuals r^(k): 

v(k+l) - v(k) Ta(k) (2-30a) 

r^(k) - v{k) - v(k) (2-30b) 

where v and v denote the observed and the (open-loop) predicted velocity 

measurements, respectively. Since no velocity measurement is used in the 

prediction other than during the initialization, the bias effect of the 

2 

failure (its signature) will be present in r (k) for k^T. In addition, a 

2 

constant accelerometer bias will produce a ran^ in r but only a constant 
bias in r^. The possible drawback of this scheme is that noise effects 
(due to the accelerometer) are accumulated. Therefore, it is useful for 
cases where the noise accianulated over the failure-monitoring period is 
small. Alternatively, when the noise level is low, this sch^oe can be applied 
with periodic re-initialization of the velocity prediction process. A 
variation of this residual -generation scheme for accelercaieter failures was 
used with success in the aircraft sensor FDI problon [111. 

A third residual-generation scheme may be devised using the parity 
relation (2-29) in a closed-loop fashion. Based on the ARMA representation 
(2-29) a filter for the velocity can be constructed: 

v(k+l) - v(k) + Ta(k) + hr3(k) (2-32a) 

r^(k) - v(k) - v(k) (2-32b) 

A 

where v denotes the closed-loop prediction of the velocity measurement, and 
h is the filter gain (0<h<l). The filter residuals r^(k) also represent 
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the dlffercnca between the observed and predicted (or expected) velocity 

measurenent, and hence, can be used as residuals for PDl. Hie advantage of 
3 2 

using r over r is that the filter gain can be chosen so that the variance 
« 3 

of V (hence that of r ) will not grow Indefinitely with tine. As a result, 
the periodic re-initialization of the prediction process can be eliminated. 

The tradeoff, for exang>le, is that the signature contained in r^ for a velocity 
sensor bias failure, will vanish with increasing elapsed time k-T, and that 
an accelerometer bias will lead to a steady state bias, not a rang>, in r^. 

This residual-generation scheme is used in PDI schemes such as the GLR [4 ] . 

In sumnary, we have described three ways a t«Rporal redundancy or parity 
relation may be used to generate residuals for FDZ. (Note that for direct 
redundancy only the first method i.e. r^, is available since no dynamics 
are involved). Generally, r^(k) is the residual of the instantaneous com- 
parison of the left hand side and right hand side of the parity equation 

2 

(2-27), and it is dependent on y^ (k-p) , . . . ,y^ (k) . The residual r (k) is the 
difference between y^ (k) and its predicted value that is c(»g>uted from 

2 

y^ (0) , . . . ,y^ (p-1) , u(t), and y^(t), i^j, t-l,...,k using (2-27). Thus r 

effectively represents a dynamic comparison (since all past u and y^ are 

2 12 

used via the dynamics (2-27) in forming r ) . In contrast to r (k) , r (k) 
depends on y^ (0) , . . . ,y^ (p-1) , and y^(k) but not on y^ (k-p) , . . . ,y^ (k-1) . The 

third type of residual r^(k) is the innovations of the filter of y^(k) based 

2 3 

on (2-27) . Similar to r , r is based on a dynamic comparison. Moreover, 
r^(k) depends on y^ (k-p) , . . . ,y^ (k-1) just as r^(k), albeit in a closed-loop 
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manner. Th^se residuals can be directly used ii; a voting schme, which 
will be discussed in the next section. Other methods for exploiting the 
failure infozTttation contained in the parity space will be explored in 
Sections 2.4 and 2.5. 

2.3 The Generalized Voting Scheme 

In this section we will describe how parity relations can be utilized 
in a generalized voting scheme (GVS) for FDI. Other usage of parity 
relations for FDI will be discussed in the next two s*fertions. For sim- 
plicity we will assume in this section that only one failure can occur, 
but extension of the following idea to the case of simultaneous failures 
is straightforward. 

The structure of a generalized voting system based on M parity rela- 
tions is shown in Figure 2-1. This FOI system basically consists of M 
tests each of which serves to determine if one of the M parity relations is 
violated. Each test has its own residual-generation process that is based 
on a single parity relation. If the underlying parity relation is a direct 
redundancy, the residual is simply the parity function. If the parity 
relation represents a temporal redundancy, then residual-generation may 
take on one of the three forms discussed in Subsection 2.2.2, i.e. the 
residual may be singly the parity function, the difference between the 
true observation and the open- loop prediction of a sensor output, or the 
difference between the true observation and the closed-loop prediction of a 
sensor output. A decision rule is applied to the residuals associated with 


RESIDUAL 'if** DECISION P*''**^ 



FIGURE 2-1; The Generalized Voting Scheme. 
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each parity relation to determine if the corresponding parity relation is 
violated. A different daciiion rule may be used for each of the M parity 
relations. Typirallyr the ^^isi«» rule employed in a voting system takes 
the f nn of a threshold tests on a single point of residual or the moving 
average of a window of residuals. However, more sophisticated rules such as 
the sequential rules ^amined in Chapter 4,5, emd 6 may be applied to make 
better use of the failure information contained in the residual for high- 
performance. c ...he last stage of the GVS, the voting logic (to be described 
in the following) is applied to the outcome of the M consistency tests to 
detect and identify the failed ccmponent. Next, we will describe the voting 
logic . 

In order to apply GVS, we need a set of parity relations with the 
property that each ccxnponent (i.e. a sensor or an actuator) of interest is 
included in at least one parity relation and each component is excluded from 
at least one of the parity relations. (If there are M coi^nents, the 
number of parity relations considered is H or more. However, later in this 
section we will see that with a slight modification of the logic described 
below, as few as M-1 parity relations can be used.) When a coH^nent fails, 
all the parity relations involving it will be violated, while those excl\:^ing 
it will still hold. This means that the components involved in parity rela- 
tions that hold can immediately be declared as unfailed. Moreover, the one 
conponent that is cannon to all of the violated parity relations is then 
readily identified as failed. This is the basic idea of generalized voting 
and is also tlie logic used by GVS to detect and identify failed components. 

It differs from the common notion of voting in the sense that through analytical 
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redundancy (parity relations) dissimilar components (incloding sensor and 
actuators) may vote together. We note that the voting systepi for linearly 
dependent sensor studied by various researchers [ 2 ] [ 3 ] I 12] aiul the 
aircraft sensor FDI syston [ 11] 2 ure special cases of GVS. 

In the ranainder of this section we will discuss seme important consid- 
erations Involved in designing a generalized voting syst^. We will do so 
by means of a second order (n»2) example: 


A * 

•ll 
. 0 

®12' 

*22 

(2-33a) 

b = 

[0 

1]' 

(2-33b) 

c^ =» 

[1 

0] 

(2-33c) 

= 

[0 

1] 

(2-33d) 


In this case, n^^=2, n^®!, n-n=3, and there are (only) tiucee independent 
parity equations: 

yj^(k)-{a^j^+a22>yj^0t-l) + (2-34) 

yiOc) - aj^j^yj^(k-l) - (2-35) 

y 2 (k) - a 22 y 2 (Jt-l) - u(k)-0 (2-36) 

These parity relations can be applied in a GVS for detecting failures in 
the sensors and the actuator, because each of the three terms y^, y^, and u 


i ir. mimt mrt. . . 
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is Inciuded in two parity relations and excluded from one. Since all the 
parity relations represent tenoral redundancy, any of the three forms of 
residual-generation (see Subsection 2.2.2) may be used. I^ee isqportant 
issues concerning the design of a GVS are examined in the following. 

1) The output y^ aj^ars in (2-35) and (2-36) with different time 
lags (emd so does u in (2-34) emd (2-36)). Hierefore, a sensor 2 failure 
will violate the parity relation (2-35) one tine step later than (2-36) 
(regardless of the fonr. of residual -generation used) . A decision process 
rr>sponding quickly to this effect will declare an actuator failure. But this 
is erroneous. If one more time step is considered, both parity relations %<oiild 
be violated, eind the correct failed component (sensor 2) can be identified. 
Although this type of transient behavior will disappear (for open-loop 
residuals, it will disappear in less than n steps, where n is the dimension 

of the system) , it suggests that temporal behavior of the residuals should 
be carefully considered in designing decision processes that can respond 
quickly and accurately. 

2) UtMier the assunqition that only one failure can occur, only two of 
the three parity relations (2-34) -(2-36) are needed for FDI. To see this, 
consider only (2-34) and (2-35). An actuator failure affects only (2-34) 
and a sensor 2 failure affects only (2-35), while a sensor 1 failure affects 
both (2-34) euid (2-35). Thus, the voting logic can be modified to recognize 
these failure phenomena, and IDZ can be acccxnplished based on two parity 
relations. In fact, it is easy to see that any ccaabination of two of the 
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above three parity relations may be ei^loyed to generate residuals for 
use with the modified voting logic for FDI, and we hove a choice among 
three coobinations.* In the deterministic cose with an exact model, all 
such conibinaticms will serve equally well, and we may use any one of tiiese 
combinations of parity relations. However, in the presou^ of noise and model- 
ling uncertainties, all the parity relations are obscured, albeit to different 
extents. Thus, residual-generation should be based on parity relatiox^ that 
are least vulnerable to such adverse effects. This design approach is the 
focus of this part of our research and it will be fully considered in Chapter 3. 

3) While some systems have more parity relations than needed for voting, 
others may have less than the necessary number. For instance, si^^se in the 
above exan^le %#e replace the single actuator with two actuators characterized 
by ® [1 1] ' and *= [-1 1] , respectively. This new configuration will 
not change n sirce A and c^ remain unchanged, and there are three paurity 
relations: 

y^^ (k) - (aii+«22^^1 '^11*22^1 ^*22"*12^ “l ”“l " 

(a22+ai2)»2^*^”^^‘*^2^^"^^“® (2-37) 

y^(k) - a^^j^y^^ (k-1) - aj^ 2 y 2 (k-l) - u^(k-l) + u^Ck-D^O (2-38) 

72 (k) - - U2(k-l)-0 (2-39) 

These three paurity relations aure inadequate 

* A system may inherently have more candidate paurity relations than required 
for GVS (with or «d.thout the modified logic). For example, suppose 1) 

in (2-33d) . Then ii^2, n-n-4, i.e. there are fcnir ii^ependent paurity 

equations, while only three (two for the modified logic) are needed. 


- 44 - 


above three peurlty relations may be aaployed to generate residuals for 

use with the modified voting logic for FDI, and ve have a choice among 

* 

three c«Bbinations . in the deterministic case serve equally well, with 
an exact model all such condsinations will and we may use any one of these 
combinations of parity relations. However, in the presence of noise and model- 
ling uncertainties, all the parity relations are obscured, albeit to different 
extents. Thus, residual-generation should be based on paurity relations that 
are least vulnerable to such adverse effects. This design aj^roach is the 
focus of this part of our research eund it will be fully considered in Chapter 3. 

3} While sane systons have more parity relations than needed for voting, 
others may have less than the necessary number. For inst 2 mce, suppose in the 
above example ve replace the single actuator with tvo actuators characterized 
by bj^ = [1 1] ' and b^ = [-1 1] , respectively. This new configuration 
will not change n since A and C remains unchanged, and there are three parity 
relations : 


yi (k) - (a^^+a^j) y^ (k-1) <k-2) + (a22-a^2^ (k-1) - <a22+aj.2 ^ "2 




(2-37) 

y^(k) 

- aj^j^yj^(k-l) - “ Uj^(k-l) + U2(k-1)»0 

(2-33) 

y2<k) 

- a22y2(k-l) - Uj^(k-l) - U2(k-1)=0 

(2-39) 


Note that (2-39) is the same as (2-36). Thfse relations are inadequate 


* A system may inherently have more candidate parity relations than required 
for GVS (with or without the modified logic). For example, suppose U 

in (2-33d) . Then n2=2, n-n=4, i.e. there are four independent parity 

eqviations, %dille only three (two for the modified logic) are needed. 
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for GVS, vrhich requires four relations for four con^nents. (^ese relations 
are also in-sufficient for use with the modified logic described earlier, 
because a sensor 1 and an actuator 2 failure will both violate all three 
parity relations.) Therefore, other FDI sch«nes that exploit the failure 
information carried by the residuals in ways different for GVS will have to be 
used. He will exeunine these methods in the next two sections. 


2.4 Failure Characteristics in Parity Space 

The generalized voting scheme discussed in the previous section repre- 
sents one method for using one form of the failure information contained in 
the residvials for FDI. In this section we will examine other methods of 
exploiting this information to detect and identify failures, and we will 
contrast them with GVS. We will primarily consider (open-loop) residuals 
that are generated using parity functions, i.e. the residual vector is simply 
a parity vector. In Section 2.5 we will discuss the case with (closed- loop) 
residuals generated by filters. 

First we will consider sensor FDI using direct redundancy. Based on 
direct redundancy, the residual vector r(k) has the form (y is m-dimensional) 


r(k) - «y(k) (2-40) 

Each row of the right hand side of (2-40) is a parity function. For GVS, 

Q is chosen such that for each sensor (j) there is at least cne component of 
r that is dependent on y^ and one component that is independent of y^ . When 
a sensor j failure occurs, all residual components depending on y^ will 
beccxoe non-zero, while all other components remain zero (assuming no noise) . 
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The mechanism of the GVS involves examining each individual congponent 
separately for a bias. Based on the location of the biases in r, a failure 
identification is made. 

Another way of using the failure information contained in the residuals 

is as follows. A faulty sensor, say the j-th one, contains an error signal 

* 

V(k) in its outputs 

y^(k) = c^x(k) + v(k) (2-41) 

When this sensor fails, we have (assuming no noise) 


r(k) » ft^V(k) (2-42) 

where Q. is the j-th column of Q. That is, no matter what v(k) is, the 
effect of a sensor j failure on the residual always lie in the direction 

Thus, is the failure direction in parity space (FDPS) corresponding 

to sensor j. (In [ 13 1 , is referred to as the j-th n»asurement axis in 

parity space.) If an Q can be chosen such that all its columns represent 

* 

distinct directions in the parity space, then a sensor of failure, j»l,...,m, 
can be inferred from the presence of a bias c<»nponent in the residtial along 
Clearly an 0 suitable for voting satisfies this condition. Generally, 
there may exist an Q with as few as two rows and with columns pointing in m 
distinct directions in the parity space. This approach to senr,.-'r FDI was 
studied by Potter and Suman 1 13 ] and Daley et al 1 16 ] . 


* The columns of 0 are, in fact, linearly dependent (but possible distinct) 
because there eure at most m-n^ linearly independent parity functions (rows c 

Q) while there are m columns in n. Here, m is the number of sensors, and n 


is the rank of C ( c is* a matrix with as its rows, and n^ is the number of 


linearly independent sensors) . 
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A FDI systen based on recognizing FDPS examines all the residual 
components together , because the failure signature being looked for is a 
particular direction in parity space not restricted to one of the coordinate 
axes defined by individual parity relations. This FDI scheme (called the 
FDPS method for brevity) is different from the GVS in that the latter exasdnes 
each residual ccm^nent separately. These two schemes actually exploit 
different aspects of the failure information carried by the residuals. This 
difference cem be illustrated by a simple example described below. Suj^se 
that the residual r(k) is a 3-vector given by (2-40). To provide a basis for 
c(xi^ring GVS with the FDPS method we assume that Q is chosen such that r is 
suitable for use in both FDI methods. Let us further suppose that the second 
con^nent r of the residual is independent of y . In order to detect a 

* A 

sensor 2 failure, GVS will look for a bias in r^ and a bias in r^ (see Figure 
2-2a) . lo detect the same failure, the FDPS methods will search for a signature 
in the direction the plane (see Figure 2-2b) . In this case the 

FDPS is defined by a precise combination of the biases in rj^ and r^, and it 
represents a more detailed characterization of the failure signature (information) 
than the biases considered by GVS. It is by the exploitation of this detailed 
information that the FDPS method can, at least in theory, detect and identify 
m sensor failures using a 2-dimensional r, i.e. H has two rows but m distinct 
columns . 

These two forms of failure information are also used by GVS and the 
FDPS method when tenoral redundancy is employed. In such cases GVS still 
examines each residual component separately for the presence of a bias. 
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Homver, the failure effect *s not necessarily confined to a fixed direction 
in parity space. To illustrate this, consider a residual vector r(k) based 
on the parity equations (2-37)- (2-39) . We can vnrite rOc) as 



* a 


» m 


1 0 0 



r(k) - 

1 0 0 -JJ 

0 0 0 1 -a«« 


yiOt-2) 

y2<Ji) 


22 


,y2(ic-i)_ 


• « 


* ■ 

*22’*12 ■^*22'^*12^ ^ 
-1 0 10 


u^(k-l) 


Uj^(k-2) 

U2(k-1) 

-1 0-10 


U2(k-2) 

■ a 


• ^ 


(2-43) 


When sensor 2 fails, its output is described by (2-41) and the residual 
becomes 





“ 


0 


0 

r(k) - 

0 

v(k) + 

-*12 


1 


”*22 _ 


v()c-l) 


(2-44) 


Unless v(k) is a constant function of tire, the effect (signature) of a 
sensor 2 failure is only confined to a 2-dimensional subspace of the parity 
space. It is easy to see that this is also true for the other three coag>onents. 
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From the above observation, we can conclude that if a conponent is 
involved in raore than one component of the residual (parity) vector and a 
window of more than one point in tine of the signal associated with this 
conponent is used in generating the residual vector (i.e. if temporal 
redundancy is used) , then the signature of this cong)onent failure is generally 
constrained to a nulti-dimensional subspace of the residual space. Now each 
failure is associated with a subspace in the parity space. These subspaces 
in general may overlap with one another, or some may be contained in others. 

If no such subspace associated with a failure is contained in another, FDI 
can still be performed by determining which subspace the residual lies in. 

(We note that the detection filters of Beard [ 7 ] and Jones [ 8 ] effectively 
act in a closed-loop fashion to confine the signature of an actuator failure 
to a single direction and that of a sensor failure to a 2 -dimensionaI sub- 
space in the residual space. We will discuss this in the next section) . Frcn 
(2-43) , it is easy to see that the subspaces associated with the four coiaponents 
(2 sensors and 2 actuators) are all 2 -dimensional but no two of them are the 
same. Hence, FDI can be acccxnpllshed using the mechanism described above, 
iidiereas the GVS would not be applicedsle in this case (see Section 2.3). 

The two forms of failure information discussed above represent time- 
independent failure characteristics, i.e. they are not dependent on detailed 
models of the (time history of the) failures as we have not assumed anything 
abcxit the nature of v(k). The tesg>oral information buried in the residuals 
is also useful. It may be used to determine the nature of the failure or cs 
added information to distinguish failures. For exeusple, consider a sensor 
failure with signature described by (2-44). Under this failure, r(k) generally 
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traverses a 2*4inenslonal subspace. If r(k) is observed to vary mly 
along (0 -a^^ ^'*22^ ' ' deduce that a cmstant bias has developed 

in this sensor. Turning this around, if ve model the failure that %»e wish 
to look for as a bias, then we can look along the specific direction 

t 

(0 -a^^^ l-a 22 l* Also, if we model v(k) in any other parametric form, 

e.g. as a ttaap, we will specify a specific type of teiaporal trajectory 
for the signature. It is this type information that is used in the (OJt 
algorithm, and it is this type of failure signature characterisatim that 
will form the starting point of our investigation of decision rules in 
Chapters 4,S, and 6. 

To see how tenoral Information can help in distinguishing failtires, let 
us consider r(k) given by (2-43). The subspaces associated with the two 
actuators are clearly not the same, but they overlap with each other. In a 
noisy environment it may be difficult to determine %ihich subspace r(k) belmgs 
to, especially when a major cong>onent of r(k) lies in the overlap. Mow, 
suppose we have models for hew the cwo failed actuators would behave over tfsm. 
Then, we can determine the (tenoral) signature of these failures (i.e. r(k) 
under each failure assuming no noise) . Since these signatures describe the 
tee^ral behavior of the residual (the direction of r!k) for each k) under the 
corresponding failures, they represent more detailed characterisation of 
failure information. As a result, distinguishability of these two failures 
can be io^roved by using a scheme tluit looks for these signatures in the 
residuals (by means of correlating the residuals with the signatures) . Indeed 


this is the basis for the (SLA method. 
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Prom th« above dlacusaion we can that in -irder tc exploit tenoral 
information > failure nodela axe required. Moreover, decision processes making 
use of the temporal behavior of the residuals are more ccHsplex, because the 
failure signatures (now time fuiictions) must be stored or generated on-line. 
Correlation of the residuals with the failure signatures will also add to the 
coi^utational ccn^lexity. 

In sutnnary, we have described several forms of information contained 
in residuals generated from parity functions which are exploited for PDl. The 
sinqJlest form is that employed by CVS. Failure directions (and subspaces) 
in parity space are time- independent failure characteristics utilized by 
several FDZ schemes. Temporal information concerning failures, if available, 
can provide even more useful information for detecting failures, although 
there is a cost in additicxial system conq)lexity. 

2.5 FOI SystMts Using Filters 

In the previous section we discussed how the information contained in 
the residuals generated by parity functions is utilized for FDZ. There is 
a large class of FDZ systems sucn as GLR [ 4 1 and the detection filters of 
Beard [ 7 ] and Jones [8] that use filters to generate residuals for FDZ. zn 
these residual-generation processes, analytical redundancy is not used in 
the same explicit form as in the processes discussed in the last section. 

Here we will attempt to determine its relationship with the aetbods discussed 


in Section 2.4. 
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The general form of the filter we will consider is given by the 
following equations 


x(k+l| 

k) ■ Ax(k|k) 4 Bu(k) 

(2-45a) 

x(k+l 

|k+l) » x(k+llk) + Kr(k+1) 

(2-45b) 

r(k+l) 

- y(k+l) - Cx(k+1 k) 

(2-45c) 


%d)ere x(k|t) is the estimate of x at k given y(l) . . . . (t) , t£k> C is a 
matrix whose rows are the c^'s; y is the vector of the m sensor outputs; 

K is th^. filter gain that is chosen differently in different FDl scheraes, r 
is the filter innovations and which are the residuals used for -FDI. Since the 
prediction xCk+ljk) is based on the system dynamics (2-5) the filter has 
already made use of temporal redundancy of the system. Note that CxCk+ljk) 
is the prediction of y(k+l) based on y (1) , . . . ,y (k) , the residual given by 
(2-45c5 is a vector analog of the closed-loop residual (r’’) discussed in 
Subsection 2.2.2. In contrast to an open-loop residual-generation process 
(vdiose main purpose is FDI) , a filter actually serves two functions; 
to provide an estimate of the stato in the normal mode (no failure) and to 
generate residuals for FDI. 

In the absence of a failure, the innovations are given by 


e(k+l) 

- A€(k) 

(2-46) 

r(k) 

- C€(k) 

(2-47) 

e(k) - 

x(k) - x(k|k-l) 

(2-48) 

A - 

AII-KC] 

(2-49) 
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A is the closed-loop filter matrix, and € is the error of the state estimate. 
Under an actuator failure, the residual is characterized by 

e(k+l) = A6(k) + fz(k) (2-50) 

and (2-47) , where f is a vector (corresponding to associated with the 
failed actuator) and z(k) is some scalar time function representing the 
temporal characteristics of the failure, sflien a sensor j failure occurs, 
the residual beccxnes 

e(k+l) = Ae(k+1) + (AK) . v(k) . (2-51) 

r(k) = ce(k) + e.va) (2-52) 

where (AK) ^ is the j-th colunui of the matrix product AK, e^ is the 
m-dimensional vector with the j-th element being one aiid all remaining 
elements equal to zero, and v(k) is the scalar time function representing 
the tCTiporal characteristics of the failed sensor. Therefore, a coog»nent 
failure affects the residuals through the matrices A and C. 

Note that A depends on the filter gain K (2-49) . As a result, the 
failure effect on r can be controlled to some extent by a choice of K. For 
different FDI schemes K is chosen to achieve different effects. For example, 
in GLR K is chosen so that (2-45) is a Kalman filter, i.e. r(k) is a zero 
mean white sequence under the no-fail condition. With this choice of K, 
failures generally do not produce special effects such as fixed directions 
in the residual space. For hypothesized z(k) or V(k) , the failure signatures, 
wnich are time functions, can be calculated, tae GLR scheme achieves FDI 
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by recognizing such signatures contained in the residuals, ai^ it makes 

heavy use of the tenoral information. 

* 

The detection filter [ 7 ] [ 8 ] for an actuator failure has a special 
filter gain that makes CA^f, i=0,l, . . • ,n-l, lie in the same direction as 
Cf. Thus, this actuator failure produces a directional effect on r, and 
the detection of this failure cam he accomplished by checking the residual 
for a component in this direction. For a sensor failure, the gain of 
the detection filter is chosen such that the failure signature is constrained 
to a 2-dimensional subspace of the residual space. (Because of the term 
(AK)j V(k) , which actually depends on the gain, and e^v(k) in (2-51) and (2-52), 
it is generally only possible to choose a K so that a sensor failure signature 
is confined to a 2-dimensional subspace [8]). Therefore, the detection fil- 
ters produce residuals carrying directional signatures, which eure similar to 
those of the open-loop residuals discussed in the last section. 

In summary, we note that residual-generation using filters is based on 
temporal redundancy. However, failure signatures are directly affected lay 
the choice of the filter gain. In GLR the gain is chosen to whiten the 
residual, and the failure signatures are arbitrary time functions. Consequently 
the GLR schemes relies on temporal information for FDl . In a detection filter, 
directional failure signatures are produced in a closed- loop fashion via a 
proper choice of the f liter gain, and the FDI mechanism is similar to that 
used with the open- loop residuals with directional signatures (see Section 2.4). 
Therefore, similar types of failure information may i>e produced via open- loop 

* The original work of Beard and Jones concerns the continuous time piroblem, 
but the theory readily extends to the discrete time case where the system 
matrix T. is invertible. Also, it is possible to design a single detection 
filter for detecting several failures. 


- 56 - 


or closed- lc»p residual-generation process. 

The discussion included in this section represents a preliininary 
effort in trying to understand how closed-loop PDI systons use analytical 
redundemcy. Further research in the future is required for a thorough 


understanding of this subject. 
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CHAPTER 3 

raSIGW OF ROBUST RESIDUAL GENERATICTI PROCESSES 

3.1 Introduction 

In Chapter 2, we presented a unified view of analytical redundancy in terms 
of generalized parity space, and we also discussed how parity functions can be 
directly used to generate residuals in the open-loop fashion for FDI. When the 
system model is exact zuid there is no noise disturbance. Chapter 2 provides 
the frennework for obtaining the exact parity equations relating the various 
sensor outputs and actuator inputs. However, modelling uncertainties and noise 
effects will corrupt t'te parity relations. Thus, residual-generation process 
based on any noninal deterministic system model will be to some degree sensitive 
to modelling errors emd noise. In this chapter, we will examine the problem of 
designing residual-generation processes that are robust in the presence of model- 
ling oincertainties and ixiise disturbance. 

Thoi^hout this chapter, we will assume that the structure of the system 
under consideration is known to have the form (2-5) -(2-6), £uid we aure only 
uncertain of the exact values of some of the elements of the system matrices. 

That is, we have the following system model that includes both modelling uncer- 
tainties and noise distiurbances, 

x(k+l) = A(Y)x(k) + f b.(Y)u.(k) + C(h) (3-D 

j=l 3 ^ 

y^(k) = Cj(Y)*(h) + rjj(h), j“l» »« (3~2) 

idiere y is the vector of N uncertain parauneters of the model , amd %ie assume 
that Y®r where F is a Icnown range of parameter values (Y6FC R ) • 
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The dependence of h, and on y indicates that their elonents may 
be (different) functions of the parameter vector. This allows the modelling 
of elements in the system matrices as uncertain quantities that may be 
dependent on one another. The vectors C and n independent, 

zero-mean, «hite Gaussieui noise vectors with constant covariance matrices 
Q(^0) and R(>0), respectively. 

In this chapter we will concentrate on the problem of determining 
optimal parity functions, where "optimality" will be defined in terms of 
a measure of how large a deviation from zero could occur in a given p6u:ity 
relation in the presence of model uncertainties. Parity relations deter- 
mined in this mcurmer can be directly used in an open-loop PDI system, and 
our technique essentially provides the optimum design for this application. 

Of course, these parity relations can also be used to define ARMA models 
based on which closed- loop filters can be constructed to generate residuals. 
Although our design is aimed directly at optimizing the robustness of such a 
open-loop algorithm, one would generally expect that a parity relation that 
is robust open-loop could also be good for closed-loop residual-generation. 

The problem of determining optimal parity relations for closed-loop residua) - 
generation should be investigated in the future, and our work provides the 
framework for such an investigation. 

Before we proceed to describe the nature of the design problon at hand, 
it is useful to define the structure and the coefficients of a parity function. 
Recall that a parity function is a weighted combination of the actuator inputs 
and sensor outputs (see (2-18)). The structure of a parity relation refers 
to the set of input and output terms and the associated sets of time lags 
for each that are included in the parity function. For example, consider the 
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parity relations 


y^(k) 

+ 7yj^(k-l) + 32y2(k-l) - 0 

(3-3) 

y^Ot) 

- .lyj^(k-l) + 5y2(k-l) - 0 

(3-4) 

y^oo 

+ 3yj^(k-l) + 4y2(k-l) - .Olu^(k-l) * 0 

(3-5) 


Relations (3-3) and (3-4) have the same structure, because both parity 
functions include yj^(k), y^^(k-l) and y 2 (k-l); (3-3) and (3-5) have dif- 
ferent structures, because (3-5) include the additional Uj^(k-l) term. 

The coefficients of a parity function refer to the (non-zero) coefficients 
of the sensor output and actuator terms in the parity function. For 
example, the parity coefficients of (3-5) are 1,3,4, and -.01. 

A redundemcy relation is specified by a parity structure and a set of 
peurity coefficients. Any parity function (p) can be written in the form 

p *= av(k) - 6u(k) (3-6^ 

where Y(k) and 0(k) are vectors consisting of the sensor outputs and 
actuator inputs included in the parity function, respectively; the row 
vectors a and B represent the coefficients of the sensor output amd 
actuator input terms in the parity function, respectively. Under the ideal 
conditions of a deterministic systems whose parameters are known exactly, a 
and B contain the non-zero elements of u and aiB of (2-12) . For example, 
we have the following Y(k), U(k), a, and B for (3-5) 
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y^(k) 
y2(k-i)_ 

U(k) - tu^(k-l)] 

a - [3 14] 

B = [.01] 

Therefore, in the above notation, the components of Y(k) and U(k) define 
the parity structure, and a and 6 represent the parity coefficients. 

We now proceed to describe our approach to the design of robust 
residual-generation processes. This approach is best illustrated in terms 
of eui exeunple. Recall the three parity equations corresponding to the 
2-dimensional exas^le (2-30) considered in Section 2.3. 

yi(k)-(ai^+a22)yi(k-l) + (k-2) - aj^2“(>:-2) » 0 (3-7) 

yi(k) - a^^yj^(k-i) - ° 

y 2 (k) - a 22 y 2 (k-l) - u(k-l) = 0 (3-9) 

In Section 2 . 3 we indicated that only two of the above parity relations 

need to be used in a generalized voting system to detect amd identify a 

single component failure. When the elements of A, i.e. a.., a and a 

XX X f ^ A 

are perfectly known and there is no noise, all ccsnbination of two of the 



£dx)ve parity functions will serve equally well for generating residuals. 
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When the exact values of and a^j are not known, (3-7) -(3-9) 

only specify the structures of three parity functions whose coefficients 
are uncertain quantities. In order to nudce use of any of these parity struc- 
tures, we have to determine an appropriate set of coefficients for it. 
Ideally, the value of a parity function is zero in the absence of a failure. 
Under noise disturbances and modelling uncertainties, any choice of parity 
coefficients will result in a non-zero value for the parity function even 
when there is no failure. Hence, a natural design objective is to choose a 
set of paurity coefficients that will make the parity function, in scane sense, 
as close to zero as fossible in the absence of a failure. Such a choice of 
coefficients effectively minimizes some measures of the effects of noise and 
modelling error on the parity functions. We will call this minimized measure 
of adverse effects the parity error (and we will define it precisely in 
Section 3.3). 

When we have chosen the parity coefficients for all three paucity struc- 
tures, we will have also determined their corresponding paurity errors. Then, 
it is clear that the residual-generation process should be based on the two 
parity functions that provide the largest failure signature to parity error 
ratios. 

From the above discussion, it is evident t)iat our design approach 
consists of three steps: 1) identify the useful parity structures, 

2) determine the coefficients and the parity errors for these paucity 
structuj:es, and 3) .termine the signature to parity error ratios that aure 
used in deciding which parity functions are to be used for open-loop 
residual-generation. We will examine these three design steps in Sections 


3.2, 3.3 and 3.4, respectively. 
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The parity 'roefficient design problem will be formulated as a minimax 
optimization (Section 3.3). The solution to such a problem is generally 
difficult. In Section 3.5, we will discuss a simple numerical example from 
which we can derive some useful insight into the solution procedure of the 
general minimax optimization. Finally, a summary of our approach to the 
design of robust residual-generation processes is included in Section 3.6. 

3 . 2 Parity Structures Under ^fc>delling Uncertainties 

In this section, we will discuss how to obtain parity structures when 
there is uncertainty in the system model. First, we will review how a paurity 
structure is obtained under the ideal condition (exact model and no noise) . 

A parity function is determined from a set of linearly dependent rows of 
Cj (n>l) , j»l,...m. Let C be the matrix composed of this set of dependent 
rows. Corresponding to these rows there are the components of 
l/j(k,nj), j*l,...,m, which are collected together to form the vector Y(k) , 
emd there are the rows of 8j(nJ, j=l,...,m, which are collected into 
the matrix B> Thus we can write a parity function as 

P = w[y(k)-8U(k,nQ)l (3-10) 

with o)00. Note that p is used in this chapter to denote a scalar parity 
function. Since not all components of U(k,nQ) are necessarily involved 
in a parity relation, we may collect all the non-zero columns of B into a 
8 and the corresponding components of U(k,nQ) into U(k). Then we have 


p » wY(k) - (i)8u(k) 


(3-11) 
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whlch is in the form of (3-6) . The structure of this parity function is 

•v ■*» 

defined by the components of Y(k) and U(k). In terms of the notation de- 
fined above, we have 

Y(k) -Cx(k) +BU(k) (3-12) 

In the presence of modelling uncertainties and noise, (3-12) becomes 

Y(k) » C(y)x(k,Y) + ♦(Y)5(k) + n(k) + 8(Y)U0«) (3-13) 

where 

C = (C(k) ... C(k+p)]’ (3-14) 

amd p is the order of the parity function. If the i-th ccanponent of 
Y(k) is y (k+H ), then 

S X 


T). (k) • r) (k+£. ) 
1 S X 


(3-15) 


and the i-th row of 4, i.e. 4>(i) , is 


£l-l 

♦(i) - (c (Y)A (Y) ... c (Y)0...0] 
s s 


(3-16) 


and 4 has p columns. It follows from (3-1) and (3-2) C and T) are 
independent, zero-mean Gaussian randcmi vectors with covariances 


E{^k)C' (k)) - Q 


(3-17) 


E{n(k)n* (k)} - R 


(3-18) 


-64- 


and the (i,j)-th element of R ia 





(3-19) 


where 6. . ia the Kronecker delta function, R ^ ia the (a,t)-th el«nent 

X- » *2 

of R, and we have assumed that (k) - . Note that x(k,y) is a 

random vector that is uncorrelated with C(k) and fj(k) and 

E{x(k,Y)} - x^(k,y) (3-20) 

E{(x(k,Y) - XQ(k,Y)nxOt»Y) - XQ(k,Y)r} - E^(Y) (3-21) 


vdtere £ [y) is the steady state covariance of x(k,Y)> which is dependent 
on Y through A(Y) and B(Y). In (3-13) we have also explicitly shown the 
dependence of C and B on the parameter y. 

Now we will consider parity structures under modelling uncertainties 
and noise effects. When ( 3 - 13 ) holds, the rows of C (Y) » v^ich are chosen 
based on some nominal value Yq of y, may not be linearly dependent for some 
other YSr. Even if they are linearly dependent for all Y^F, for any choice 
of the parity coefficients, p (3-11) is generally not zero in the absence 
of a failure. This is because u satisfying <i)C(Yq)* 0 does «uc generally 
in^ly wC(Y)* 0 for all yeF. However, the parity structure of (3-11) is 
useful if we can find a set of parity coefficients that will make p close to 
zero under the no-fail condition. From this point of view, it is not 
necessary for a parity structure to be- based on a C that is composed of 
linearly dependent rows of Cj(n^+1), j=l,...,m, as long as we can determine 
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a —t of appropriate coafficlentB that will reault in a p that 1« do— 
to zero thara la no failure . Then, the procedure of obtaining a 
parity structure described in the beginning of this sectiM can also be 
affiled to a C that is exposed of aom arbitrary set of roi^ of 
C^(n^<t>l), The usefulness of such a parity structure d^nds <m 

the choice of coefficients which will be considered in the next section. 

Using the above reasoning, we can obtain many candidate parity struc- 
ture for residual-generation. However, not all of these structures are 
useful for a particular FDZ syst^. Consider a generalised voting systw, 
for instance. He need a set of parity functions ^hat satisfy: 1) all 

components of interest must be included in at least one parity function, 
and 2) all cosgonents (possibly except one) must be excluded from at least 
one parity function. Requirements such as these help in limiting the number 
of candidate structures to be considered. In most applications, special 
feature of the system matrices will provide additional insights in the 
choice of parity structure. We will not address this problem further, but 
will focus on optimizing the set of parity coefficients once we have chosen 
a parity structure. Then we can use the results of this optimization to 
cosg>are the usefulness of different parity structures. 

3.3. Design of Parity Coefficients 

Here, we will examine the problem of determining an appropriate uet 
of coefficients (0,8) for a given parity structure (3-6) 


p “ OYOc) - BuOt) 


(3-22) 
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Whan hava noiaa and uncartaintlaa in tha ayatan modallad by (3-1) (3-2) . 

In tha following, wa will daacriba a fornulation of thia daaign taalc aa a 
miniatax optimisation problam. 

Conaidar tha parity function (3-22) . Undar modelling uncertaintiaa 
and noi&a, p is generally not zero for any c)«>lca of a and It is in 
fact a function of and Substituting (3-13) in (3-22) we have 

p(a,6,Y,x(k,Y) fU(k)) ■ aIC(Y)x(k,Y)+^’(Y)C(*')+n(k)+8(Y)U(h)] 

- 6U(k) (3-23) 

Ideally, for (3-23) to be n parity function, p must be zero under the 
no-fail condition. Therefore, in order for p to be a useful parity function, 
we have tho choose a and 6 such that p is close to zero in the absence of 
a failure. 

Due to the noises ri, and the random state x(k,y),p is a random 
variable. A convenient quantity indicating the magnitude of p is 
E{p (a, 6,Y»x(k,Y) »U(k) } , where the expectation is taken with respect to the 
joint probability density of x(k,Y),C(k), and n(k) (assuming the pareuneter 
vector has the value Y) • Define 

e(a,g,x (k,Y*), I (Y*)»U(k)) • max E{p^ (a,6,Y,x(k,Y) ,U(k) } (3-24) 

° * ',er 

where Y* is the value of Y that solves the maximization, and Xq(X,Y*) and 
E^(Y*) are the mean and covariance of x(k,Y*) (3-20), (3-21). The quantity 
e can be interpreted as the worst (sffect of the modelling error and noise 
on the Parity function p. Then, we can attempt to achieve a conservative 


design by solving 
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min e(a,0,X-(k,Y*) , I {Y*)»U(k)) (3-25) 

a,6 O x 

As it stsnds the minimization problem (3-25) is not meaningful. Since 
p is linear in a and (3-25) has a trivial solutions O'-O, 0*0. A parity 
fsmction primarily relates the various sensor outputs > i.e. a parity func- 
tion always includes sensor outputs but does not necessarily include actuator 
input (e.g. a direct redundancy relation). Therefore, a must not be zero. 
Without loss of general) ':/, we can restrict a to be unit magnitude. The 
actuator input term in a parity function may be regarded as serving to make 
the parity function zero (c<xnpare (2-10) and (2-12)). Then, B is essentially 
free. However, we will now show that for each value of U(k), B actually 
only has a single degree of freedom. For a given U(k) , any 6 can be vnritten as 

B - Xu* (k) + z (3-26) 

where X is a scalar, and z* is a vector lying in the subspace orthogonal 
to U(k), i.e. zU{k)««0. Hence, the component z in 6 will not produce any 
effect on p. This implies that we only have to consider B of the form 

(3-27) 


(3-28) 


6 - Xu* (k) 

where X represents the only degree of freedom of B. 

Now we can construct u meaningful, optimization problmn: 

min max E{p^(G,B,Y,x(k,Y) ,E (Y)»U(k))} 
u,B Yer * 

s.t. <aa**l 
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Using a 3 of the form (3-27) and (3-17), (3-18), and (3-23), we can mite 
E{p^} as 

E{p^(a,X,Y.x(k,Y),U(k))}=[aX)S(Y,x-(k,Y),E (Y) ,U(k) ) laXj * (3-29) 

^ X 

where S is the synmetric, positive-definite matrix for all Y 


S(Y.XQ(k,Y) ,U(k)) = 


S 

S 


11 

^12 

21 

®22 


(3-30) 


Sn = C(Y) IXo(k,Y)x^(k,Y)+Z3j(Y)lC’ (Y) +«>(Y)^*(Y) +R <3-3D 

+ B(Y)U(k)U' (k)B' (Y) + C(Y)xQ(k,Y)U' (k)B* (y) 

+ 8(Y)U(k)x^(k,Y)C' (Y) 


®12 “ ®21 “ -VfB(Y)U(k) + C (Y)xQ(k,Y) ] 



W = [U* (k)U(k)] 


(3-32) 

(3-33) 

(3-34) 


Note that S is dependent on x„(k,Y) and £ (y), the mean and covariance 

0 X 

of the system state, given y i? the true parameter vector. In general, 
XQ(k,Y) and are very complex functions of y. As a result, the 

maximization of E{p } over y is very difficult to perform. However, the 
problem may often be simplified by a reasonable approximation discussed in 


the following. 



f 
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The principal feature of a parity function is to relate the outputs 
of different sensors and inputs to actuators at different points in time. 

The matrices C* and B, which contain the dynamics of the system, repre- 
sent the dcsninant feature of the parity relation. From this vantage point, 
the primary effect of the uncertainty in y is manifested through C, and 
B. Thus, it seems reasonable to approximate x^{k,y) and 

by Xgth) and corresponding to some nominal value of y since the effect of 
Y on e through x^lkiy) and is indirect and only of secondary 

importance. With this approximation, the dependence of S on Y is simplified, 
euKi the minimax problem takes the form 


] 


3 

i 

3 

1 

3 

i 

i 


min max [aX]S (y,x„ \k) , Z ,u(k))[aX]' (3-35) 

a,x ver “ 

s.t. oa'=l 

1 

1 

J 

Despite the fact that the objective function of (3-34) is quadratic | 

in a and X, it is generally very difficult to solve, because S may depend j 

on Y arbitrarily. In Section 3.5, we will discuss a simple exan^le for 
which a solution procedure has been developed. There, we will also report 
some insights into the solution of the general problem obtained from this 
example. 

i 

We let a* and X* denote the values of a and X that solve (3-35), i 

with 6* = (k) , and 

e*(x„(k),E ,u(k)) = e(a*,e*,x-(k) ,E ,U(k)) 

Ox Ox 




(3-36) 
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We call e* the parity error of the parity fvinction 

p* « a*Y{k) - B*U(k) (3-37) 

Hence, e* is a measure of the (minimiz^id worst case) effect of modelling 
error and noise on (3-36). From the viewpoint of our design objective 
(3-28) e* measures the fittness of (3-37) as a paurity function. 

As an aside, we note that a and 6 are not further constrained such 
that aC(y)=0 and B = aS(Y) for sc»ne value yeV. This is because even if 
they were, the resulting parity error would not be smaller than that of 
(3-36). In addition, just as for o* and 6*» the a and B satisfying aC(Y)«0 
B = aB(Y) do not satisfy and6=a 8 (Y^^.^^) in general. We 

now rettirn to the ainimax problan. 

Note that the minimax solution a* and X*(B*) are dependent on Xq(J^) 
and U(k) . In general, this means new coefficients would have to be con 5 >uted 
at each time step if we wanted to continually achieve the optimum. This is 
clearly an undesirable requirement. Very often a set of coefficients will 
work well for a range of conditions, i.e. for x in some region of the state 
space. Therefore, a practical approach is tc schedule the coefficients ac- 
cording to the operating condition. Each operating condition can be repre- 
sented by a set-point, which is characterized by some nominal state x^ and 
Uq that are independent of time. Parity coefficients can be pre-coo^juted 
(by solving (3-35) with x^ and in place of x^Ck) and U(k)) and stored. 
Then, the appropriate coefficients can be retrieved for use at the cor- 
responding set-point. When the state and the inputs are varying slowly. 


-71- 


this scheme of scheduling coefficients is especially likely to deliver 
performance close to the optimum. In the remainder of th.is chapter we will 
focus on the design of parity coefficients for given set points. 

The above formulation of the parity coefficient design problon may be 
slightly generalized to account for the fact that the true actuator input 
may not be exactly. To acconqplish this we will let the actuator input 
term U(k) be + 6u(k) in the minimax problem (3-28). The term is the 
set-point input (yielding the set-point state x^) and 6u(k) is a randexa 
process that may be used to model two effects. In a true set-point operation, 
the set-point is often maintained by means of feedback through a ccxnpensator . 
Thus, the actuator input contains and 6u(k), which represents the 
variation in the input due to feedback. Based on the structure of the com- 
pensator, an expression for 6U(k) can be derived and subsequently used in 
(3-28). We note that 6u(k) in this case will be correlated with the state x. 
Using such an input in (3-28) will lead to additional terms in the S matrix 
that are the covariance of 5u and cross covariance of 6U and x. Since U(k) 
is not a fixed term, $ will no longer be of a single-degree-of freedexn but 
con^letely free. As a result we will have to consider the full vector B in 
the minimax problem instead of the scalar variable X as in (3-29) . However, the 
basic form of the design problem is not altered. 

In the case where the set-point design is used in the parity coefficient 
scheduling scheme described earlier, 6u(k) may be used to model deviations 
from U^ that are due to time-varying inputs used to change the state, such as 
in a manuevering vehicle. This will allow us, to some degree, to account for 


- 72 - 


the fr.ct that the input is not necessarily fixed at in the minifflax design. 

For simplicity, we may model 6u(k) here as a white Gaussian process with 
covariance that is uncorrelated with the state (although more con^lex 
models cw be used) . Again, the inclusion of such a 6u will not change the 
structure of the minimax problem, which will be the same as in the previous 
case. 

In this section, the miniioax design method was developed based on the 
assumption that y is unitnown. If a probability distribution over F may be 
obtained (or postulated) , an alternative design formulation of the parity 
coefficient design problem is possible. Namely, instead of minimizing 
max £{p we can consider minimizing £{p }, where E denote expectation 

yer 

with respect to the joint distribution of x(k), f), ^nd Y- Then, we are 
looking at the averaged effect of Y on p. The design problem simply becomes 
a constrained quadratic minimization (it is essentially an eigenvalue- 
eigenvector problem), and it is simpler to solve. Detailed investigation of 
the merits of this approach is left for a future study, 

3 . 4 Choice of Parity Functions for Op e n-Loop Residual-Generation 

In the last section we presented a method for determining a set of parity 
coefficients for any given parity structure and given nominal set-point x^, 

Uq that is best in the sense that it minimizes the maximum mean square value 
of the parity function under the no-fail condition. After applying this method 
to the candidate pcirity struc cures, we have a set of camdidate parity functions. 
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In this section, we will discuss the criteria for choosing an appropriate 
set among the candidate parity functions for use in open-loop residual- 
generation processes. 

Associated with each candidate parity function (3-28) is a parity 
error e* (3-35). Since p* is linear in a* and 0*, the magnitude squared 
of the combined vector of parity coefficients (o.Bl scales the parity error. 
Therefore, the parity errors associated with the candidate parity functions 
can be compared if they are normalized. We define the normalized parity 
error e 

e* = e*/| ! I I (3-41) 

where 

|l[a*,0*]|| = {[a*, 0*1 [a*, (3-42) 

= 1 + 0*(B*)' 

and the normalized parity coefficients 

5* = aVll[a*,B*]!! 

6* = BV||la*,B*l|! 

Then, we can consider normalized parity functions: 

p* = a* Y(k) - 6* U(k) (3-45) 

It is now clear that normalized parity functions that have the ssMllest normalized 
parity errors are preferred, because they are close to being ideal parity 


(3-43) 

(3-44) 


functions. 
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However, this is not the only issue that must be considered in 
designing robust FDI systems. Note that the usefulness of a parity function 
as a residual depends on how visible failure effects are in con^rison to 
the inherrent parity error. To illustrate, consider an exas^le in which we 
assume an additive failure and we let g denote the additive terns that ap- 
pear in the normalized parity function under this failure. (Note that g 
is the signature of this failure amd it is dependent on a* and B*.) We 
define the signature to parity error ratio tt as 

IT - (3-46) 

Suppose we have to choose between two parity functions for the FOl of 
a particular failure. Then, we should choose the one that gives a bigger tt. 

For instance, consider the parity structure (3-8) (3-9) of the exaiq>le discussed 
in Section 3.1. Suppose we have determined the normalized parity coefficient 
for these structures, and we have the parity functions 

Pj^ = .778 y^(k) + .545 y^(k-l) + .311 /^(k-l) 
p^ = .592 y^(k) - .474 y 2 (k-l) + .652 u(k-l) 

If the failure we are interested in detecting is a bias of magnitude v in 
sensor 2, the signature g^^ and g^ corresponding to pj^ and p^ are 

- .311V 


g^ * .118V 
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and 

ir^ - .31lv/IeJ]^ 

where e* and e* are the normalized parity errors of and p^. Dierefore, 
for detecting a bias in sensor 2, we should choose Pj^(P 2 ) 
i.e. if ej <(>)6.92 e^. 

In sunnuury, parity functions with small normalized parity errors are 
usually preferred. In considering the detection and identification of 
particular failures > the signatiire to parity error ratios should be compared. 
The requirement for using ir is that the nature of the failure will have to 
be given. In the above exansple, the failure of interest is sinq>ly a bias in 
the second sensor. 

3 . 5 A Numerical Example 

In this section, we consider the coefficient design problem for a 
simple example. He will develop a sinq>le solution procedure for this prolan, 
and we will also discuss the relation between this solution method and the 
solution of more general coefficient design problems. 

The Numerical Example 

^e system under consideration is a 4-dimensional system operating at 
a set-point with two actuators and three sensors. The values of the system 
matrices are shown in Table 3-1. Except for two elements of the A matrix. 
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.5 -.7 .7 0 

0 • ® 0 

-1 0 0 .1 

0 0 Yj .4 


B = 


0 

1 

0 

0 


0 

0 

1 

0 . 


Cj^ = ro 0 1 oj 

- 10 X 0 0] 

Cj “ 10 0 0 1] 


Yj^e r.02, .2] 


NOMINAL Y^ = -1 


y^e I-.2, -.1] 


NOMINAL y^ ” -.15 


TABLE 3-1 ; System Parameters 




I 
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all other parameters of the system are known exactly. These two uncertain 
elements are assumed to be two independent parameters denoted by Y 2 * 

The ranges of these parameters are also shown in Table 3-1. The system is 
stable over the specified range of parameter value. In addition, there are 
plant and sensors noises, but we will describe them when we discuss the 
numerical results later in this section. 

Suppose we want to design a generalized voting system for detecting a 
sensor failure. Three candidate parity structures are 


Pi 


y2(k) 


y2(k+l) 

yj(k) 


(3-47) 


p 


2 


y2(k) 

y^Ot) 

yj^(k+i) 

y. (k+2) 


p 


3 


y3(k) 

y^Ck+i) 
yi(k) _ 


(3-48) 


(3-49) 
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The corresponding # and C matrices are shovm in T 2 ible 3-2. Note that each 
C(4) depends linearly on either Yj ^2 

linearly dependent for any value of y^. The parity structures under con- 
sideration do not include any actuator inputs due to the fact that c^^B, 
c^B, c^AB, and CjB are all zero. This will not cause any severe restriction 
in the following discussion, because the absence of the actuator inputs does 
not change the structure of the minimax problem (3-35) . Assuming only a 
single sensor failure may occur, only p, plus either Pj^ or p^ need to be used 
for residual-generation (because both Pj^ and include sensors 1 and 2) . 

Solution Procedure 

Here, we will discuss the procedure for determining the coefficients for 
the above parity iitructure and also discuss several direct extensions sug- 
gested by this pTocedure. We will discuss its relation to the solution of the 
more general coefficient problem at the end of this section. 

Since the three parity structures are of the same form (i.e. p<»C(Y(k)), 

we can consider one generic problem characterized by C, a, etc. 

(without the index associated with the particular parity structure) , We 
will let Y denote the scalar parameter that appears in C (since C is dependent 
on only one parameter in each of the three parity structures) . The minimax 
problem is of the form 

min max a S{y)a,' (3-50) 

® YGIY^/Y^l 

s.t oa'*l 



/ 


so 


wh«re the dependence of S on the set-point le suppreeeed> end end 'Y^ 

denote the minimum and maximum of the parameter vaxiation. Substituting C 

and ^ of Table 3-2 into the expression for S in (3-33) > can easily see 

that S is quadratic in y (because C and S are linear in y) , and S is positive' 

definite for any value of y. Hien, oiS(Y)a' is convex in y. In order to see 

2 * 

this, we can write as(y)a' • ^ ‘*■'' 1 *^ where Wj* Wj^» Wq are 

scalars dependent on a.) It follows that the maximum of aS(y)a' for any value 

1 2 

of o occurs at either y or y , and (3-50) becomes 

min max (f,(0),f (a)] (3-51) 

O 12 

s.t. oo'«l 

where 

f^(a) - as(y^)a', i-1,2 (3-52) 

To gain some insights into the solution of (3-51) , let us consider a 

2-dimensional version of this problem, i.e. suppose that S is 2x2. In 

Figure 3-1. we have shown the ellipses f ^ (o) * s^, i*l,2 in the a-plane. 

As s^ increases, the ellipses grow in size. Recall that a is constrained 

to be magnitude one, and this conF'tzaint takes the form of a unit circle in 

the a-plane. Along the unit circle we can trace the value of f^(a) , and 

this is shown in Figure 3-2. T)iere are basically two cases, nie first case 

is when the minimum of both f^ and f^ (along the unit circle) fall below 

the other function (see Figure 3-2a) . It is easy to see that the solution 

of (3-51) has to be at one of the intersection points a^ or 0^, i.e. 
11 2 2 

fj^(o ) ■ ^ 2 ^- ^ ^ Figure 3-2a, the minimax solution is 
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1 12 

clearly at a , since fj^(a )< (Q ) • Th® second case is v^en the minimum 

of either or remain above the other function (see Figure 3-26). Then 

it is clear that the minimum a* is the miniaax solution. 

Ihe above reasoning can be easily extendeo eo higher dimensional 

problems, i.e. S is of arbitrary dimension, as long as S(y) is quadratic 

in y. In fact, it can be easily shown that the solution of (3-51) can be 

obtained by a two-step procedure (see Appendix A) . To simplify the descrip- 

''1 >'2 

tion of this procedure, we need to define a , a and A in following manner 

f^(a^) = min f^(a) i=l,2 (3-53) 

a 

s.t. cia'=l 

A* {ct: f^(a) = Ota’=l) (3-54) 

i. 

since = oS(y )a(S is tymmetric and rxisitive-definite) , a is simply 

the eigenvector of S(y^) corresponding to the smallest eigenvalue. 

Tne two-step solution procedure for (3-51) is : 

1) The first step involves chec)iing the conditions 

f^(a^> f2(aS (3-55a) 

f2(a^)> fj(a^) (3-55b) 

-'I ''2 

If both a and a satisfy (3-55) , then the a giving a smaller f^(a ) is 

*Jie solution of (3-51) . If only one a satisfies (5-55) , it is the solution. 
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If neither satisfies (5-55 ) , then we have to perform step 2. 

2) Search over the set A (3-54) to find an a that minimizes f^(o) or 
(since over A, fj^(a) = This is equivalent to solving a 

quadratic programming probl^ with quadratic constraints: 


min f (a) (3-56) 

a 

s.t. oa*=l 

alS(Y^)-S(Y^) ]o =0 


Ihe solution of ( 3-56) is the minimax solution to ( 3-51) , if step 1 fails 
to produces a solution. The minimization may be solved numerically using 
existing optimization techniques I 14 ] . 

The above solution procedure can be readily extended to the case where 
S is a quadratic function of an N-vector Y and each element of Y is indepen- 
dently bounded by an interval (i.e. F is a hyper- rectangle) . Then the 
minimax problem (3-51) becomes 


min meui f . (a) 

® i=l,...,2” 


(3-57) 


s.t. cta'=l 


where f^(a) = ctS(Y^)ct', cuid Y^» i=l,...,2** denote the 2** comers of F. 

The above solution procedure is modified as follows. In Step 1# we have 
to consider the minimax of all f^(tt) . That is, we have to replace (3-55) by 


f.(a^) 

1 


min 


j»l,. . . ,2 


N 


f^(a^), i=l,...,2” 


(3-58) 
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In step 2, we will have to solve (3-56) for all con^inations of f^(a) and 
f j (a) r . Then the minimum of all the solutions of the quadratic programs 

is the minimax solution. 

This solution method can be extended to handle parity structures with 
actuator inputs. Of course, the corresponding S matrix has to be qxiadratic 
in Y our approach to be valid. In Appendix a, we will discuss such an 
extension to include actuator inputs for the scalar case. The N-parameter 
case with input terms can be treated in the same way as the N-parameter 
case with no input term was treated in the above discussion. 

Numerical Results 

Ihe minimax design problem has been solved for a set of six test con- 
ditions consisting of different set-points and different plzmt and sensor 
noise intensities. These test conditions are described in Tables 3-3 and 
3-4. The two set-points I-4.16 7.03 4.06 -1.01]' and [4.06 2.90 5.80 

-1.45]' can be obtained by applying Uj^=l and U2=10 to the nominal system 

rl t.2 

model. The ncaninal state covariances £ and £ due to the two different 

X X 

1 2 

hypothesized plant noise intensities Q and Q are listed in Table 3-5. 

A con^uter program based on the penalty-multiplier method Il4l for 
solving non-linear optimization problems is used when it is necessary to carry 
out step two of this solution procedure. In addition to the sywastry (i.e. if 
a is a candidate solution of (5-56), so is -a), there are generally local 
minima for (5-56) . In order to obtain the global minimum, a large number of 
initial guesses may have to be used in the minimization program. For our 
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25 

0 


0 


L 0 


0 

0 

0 

0 


0 0 “ 
0 0 
0 0 

0 .25- 


2 

Q 


.25 0 

0 .5 

-.325 0 

0 0 


.325 0 

0 0 

.625 0 

0 .25 


0 


R 


0 


= 1,2, i»l,2,3 


TABLE 3-3; Itoise Variances 


CORD. CODE PARAMETERS 


a 

*0 “ 

[0 0 0 
DIAG R 

0] 

= [1 

1 1] 



= 

[-4.16 

7.03 

4.06 

-1.01] 

b 

0 







DIAG R 

= [1 

1 1] 




[4.06 

2.90 

5.80 

-3 .453 

c 

1 






Q » 

DIAG K 

= [1 

1 11 




[4.06 

2.90 

5.80 

-1.451 

d 








DIAG R 

= [1 

2 2] 



x^ = 

[4.06 

2.90 

5.80 

-1.451 


0 





e 

. 

DIAG R 

* [2 

1 11 


f 

X * 

*0 

[4.06 

2.90 

5.8 

-1.451 



DIAG R 

- [1 

1 11 



TABLE 3-4: Test Conditions. 
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.5580 

.0342 

-.1508 

-.0552 

1 

r = 

.0342 

.0102 

-.0129 

-.0097 

X 

-.1508 

-.0129 

.5772 

.0117 


.-.0552 

-.0097 

.0117 

.3113 


1.958 

-.8434 

-1.114 

-.1049 

2 

-.8434 

1.803 

.7691 

-.1996 

Z K 

X 

-1.114 

.7691 

2.608 

-.1081 


-.104 

L 

-.1996 

-.1081 

.3829 

TABLE 3-5 

: Ncxninal 

E 




X 
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/si *2 

problems, we always used either a or a (corresponding to the minlsia of f^^ 
and f^) as the initial guess for the second step, and the algorithm has always 
converged to an e* quite rapidly. Other initial guesses were also tried, but 
they either led to the same solution or higher e values. Therefore, we may 
conclude that we have determined the global minima for our problems. The 
resulting coefficients ^u:e tabulated in Table 3-6. 

For this exaaple, it is evident that the parity coefficients are 
generally strongly dependent on the test condition (the veU.ues of x^, Q, and 
R) . Although this dependence is very coi^lex, some insights ma.y still be 
obtained from the numerical results. Consider, for insteutce, p^ under con- 
ditions b and c. From Table 3-6, we have for condition b the parity function 

p* « .6411 y,(k) - .7666 y,(lc+l) + .0378 y, (k) (3-59) 

anid for condition c 

p*^ = .8947 y2(k) - .3667 y2(k+l) - .2551 yj^(k) (3-60) 

The only difference between these two test conditions lies in the value of 
Xq (see Table 3-5). Since the first and fourth column of (Table 3-2) 
are zero, only the second and third elements of (x ^2 will play 

a role in the coefficient optimization problem. The parity function p^^ can 
be written in the form of (3-26) ; 

h ■ V02 * * “S'OS * 


(3-61) 
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TEST 

COND. 

PAR. 

FUNC. 

e 

a* 


1 

1.002 

.7282 

-.6808 

.0791 


a 

2 

1.008* 

.9983 

.0223 

.0483 

.0219 


3 

1.118* 

.6833 

-.7208 

-.1167 



1 


.6411 

-.7666 

.0378 


b 

2 


.4462 

.5079 

-.4356 

.5942 


3 

1.210 

.7027 

-.7115 

1 

. 

o 



1 

1.096 

.8947 

-.3667 

-.2551 


c 

2 

1.055 

.9599 

-.1484 

.1992 

.1296 


3 

1.230 

.7592 

-.6504 

.0249 


j 

1 ! 

1.908 

.7865 

.3023 

-.5385 


d 

2 

1.123 

.7345 

-.5931 

.4697 

-.6559 


3 

2.228 

.7981 

-.6007 

.0684 



1 

1.124 

.8058 

-.5832 

-.1025 


e 

2 

1.122* 

.9669 

-.1204 

.1242 

.1875 


3 

1.230 

.7441 

-.6678 

.1692 



1 

1.427 

.7327 

-.6803 

-.0166 


f 

2 

1.311* 

.5146 

.4404 

-.3312 

.6570 


3 

1.254 

.6385 

-.7687 

.0375 



Solution obtained in Step 1 of 
solution procedure 


TABLE 3-6; Minijnax Parity Coefficients and Parity Errors 
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v^ere and denote the eloaents of a corresponding to 

y^dc-t-l), auidy^(k) respectively, and C denotes the ronaining noise terns. 

It is clear that and modulates the effect of on p^. Qualitatively, 
as bectnnes large relative to 1 x ^21 noise covariances the 

same), the optimal will reduce in size (relative to and a^) in order to 
keep the effect of small. As increases, the signal to noise ratio 

of y^(k) also Increases. Therefore, we ei^ct |a^| to becooe large so as 
to make better use of the infonaation provided by y ^ (k) . iktder conditions 
b, Xq 2 (“7.03) > ("4.06) , and vinder condition c, Xq2("2.9)< Xq 2(»5.8) . An 

inspection of (3-59) euid (3-60) shows that the above reasoning holds. 

Note that both p^ ai»l p^ relate the first sensor with the second one, 
and is a higher order relation than p^. Furthermore, the rows of 
are tK>t linearly dependent for any value of y 2 . However, the parity error 
associated with P 2 is smaller than that of p^ in all cases except condition 
a. This shows that a higher order peurity function (which is likely to contain 
high order effects of y^) is not necessarily more vulnerable to modelling 
errors and toise. In addition, a parity function baLsed on a C matrix with 
rows that are linearly dependent for all values of y does not necessarily 
produce a smaller parity error thaui a paurity function that is based on a C 
with independent rows. 

In Tad>le 3-7 we have tabulated the signatvure to paurity error ratio (ir^) 
associated with the parity function for sensor bias failures under conditions c 
and d. (ir^ denotes the signature to paurity error ratio associated with a 


-90' 


TEST 

PARITY 

TT 



COND. 

FUNCTION 

1 

2 

3 


Pi 

.243V^ 

•SOdV^ 

- 

c 

P2 

.176V^ 

1 .934V2 

1 


P3 

•022V, 

1 

- 

•lO’Vj 


^1 i 

.390Vjj^ 

■788Vj 

- 

d 

P2 

. 733V^ 

.693V, 

- 


P3 

. 046Vj^ 

- 

. 126 V 3 


TABLE 3-7: u Values for Selected Test Conditions 
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parity function for a sensor i bias failure of nagnitude v.) Ihe parity 
function has a larger in condition c and a larger ir^ in condition 

d; p^ has a larger tt^ in condition c. The parity function p^ has very 
snail TT^ and in both conditions c and d. Therefore, p^ is not very useful 
for detecting a small bias in either sensor 1 or sensor 3. 

Further Discussion of Minimax Solution 

Earlier in this section, we discussed a single method of solving the 
miniroax probl^ when the objective function os(Y)a' is quadratic in Y* 

The simplicity of the solution is a direct consequence of the fact that the 
maximization of aS(Y)a* with respect to Y (3-50) can be replaced by the 
maximum among aS(Y^)a' at the finite number of corners (Y^) of F (which is 
assumed to be a hyper-rectangle) for all a* Whenever this replacement can 
be done, this simple solution method applies. This is possible, for ins- 
tance, when (xS(Y)a' is convex over F for all a (oa'»?.) . Nonlinear dependence 
of the elements of A, B, and C on Y ^<1 higher order effects (due to prc^ucts 
such as CA, CA , etc.) in C, 4, and B will ma)ce the objective function of the 
minimax problem (aS(Y)o' or [oA]S (Y) [oX] ' when there are actuator terms) 
non-quadratic in Y* 1^- such cases, the task of checking for convexity is 
generally difficult. Moreover, convexity is not a necessary condition for 
replacing the auucimization of as(Y)oi' over Y by the maximum of 
as(Y^)o', i>»l,...,M (where M is the number of corners of F) . By examining 
the geometry of the problem, some insights have been obtained, and we will 
discuss them in the following. 
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Suppose there is on. f a scalar parameter Y and there are only two 
elements and 0^, among all elements of C and that are dependent on y. 
(Although we will only consider aS(Y)Q(' in this discussion, it is clear 
that the extension to include actuator terms is immediate.) Then, and 

are also dependent on each other. Let us write * aS(Y)CX'} 

is a positive quadratic function of and for any value of a. 

For each value of a, we can draw the ellipsis different 

values of 0 in the plane (Figure 3-3a) . Note that the ellipsis increases 

in size with increasing 9. 

Because and both depend on Y, they can only take on values along 

the curve (0^^ (y) (Y) ) with Y varying ever its range F. The curve is 

characterized by a scalar equation, say G^(o^,02)*0 (see Figure 3-36). By 

tracing al jng this curve (called G^) and noting the values of h^ and Y along 

it, we can obtain the function b^(Y) • aS(Y)a* for a fixed a (see Figure 

J-3b) . It is evident that the maximum of h does not occur at either end 

a 

of r (in the case shown in Figure 3-36) , Now suppose the relationship 

between and is characterized by another curve (dj^,O2)“0 (Figure 3-4a' . 

The plot of h versus Y for G. (Figure 3-4’u) shows that the maximum of h is 
a 2 a 


at Y 


Based on the geometry described above, we can re— state the condition 


~ - 1 - 2 

under which max h (Y) = max[h (Y ),h (Y )) for a fixed a as follows. 
.. o a a 


(We will let O * simplify the notation). For each a, consider 

the ellipses h (o) » h (Y^) and h (o) = h (y^) that pass through o(f^) and 
a a u (X 

2 

0(Y ) respectively. Define Y* such that 
Y* ■ a-g max (Y^).^ (Y^H 

2 a a 

Y ,Y 


( 3 - 6 -’: 
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ho(oiflj)=S 


o-(y') 
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ttote that y* is dependent: on a, and h^(0) « i* the bigger of the 

1 2 

two ellipses passing through o(Y ) and a(Y )> Then, 

- - 1 ' 2 

BMX h (Y) ■ maxlh (Y ) »h (Y )1 if and only if the curve G (describing the 
Y o o o 

~ * 

relationship between and O 2 ) lies within the bigger ellipse h^(o) ■ h^(Y )• 
Mathematically, this is equivalent to the condition 


h (0) < h (Y*), oe{ai G(o)-o} (3-63) 

V ~ (X 

It should be noted that a brute-force approach to testing condition (3-63) 
will result in the evaluation of h^(Y) for all Yfil". A conceptually simpler 
approach to testing this condition is described below. 

Consider the siiriultaneous equations 

h (a) - h (Y*) (3-64a) 

a a 

G(0) “0 (3-64b) 

Assuming O is continuous in y (hence the curve G is continuous) , we can 

deduce that if the set of solutions to (3-64) does not contain any point 

a other than o(y ) and/or o(Y ), then the curve G is either ccsnpletely inside 

or outside the ellipse h^(0) ■ h^(Y*) (and touching it only at a>y^) and/or 
2 

0(Y )). Then, the curve G lies inside the ellipse if the follovAng is 
true. 

h ( 0 )< ii (Y*), a e(0: G(O)«0} (3-65) 

a a 

Therefore, the testing of the condition (3-63) requires studying the 
solutions of the simultaneous equations (3-64). If the solution set of 


6 
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1 2 

(3-64) does not contain any a other than a(y ) and a(y ) and (3-65) holds, 
then max ■ i»x[h^(Y^) ,h^(Y^) ] for a fixed O. 

In order for the simple minimax solution described earlier to apply, 
condition (3-63) has to hold for all a such that oa*®!. This isqplies that 
have to examine the solution of (3-64) and condition (3-65) as a function 
of a. This is generally a difficult task, because, even for a fixed a, the 
solution of (3-64) is difficult to determine for an arbitrary G. Nevertheless, 
this approach provides an important perspective on the problem, i.e. the 
objective function <xS{y)a is now separated into h^(CT), tfhich is independent 
of y, and G(0), which contains all the effects of y. This explicit isolation 
of the effects of y will allow us to exploit the structure of G (i.e. the 
inter-dependence of and through this mutual dependence on Y) to deter- 
mine if (3-63) holds. However, future work is required to develop this 
concept into a practical procedure for testing (3-63) . In closing, %ie note 
that the adx>ve discussion for the case of 2 a's and a scalar y can be readily 
generalized to include multiple a's and a vector y. 

3.6 Sunanary 

In this chapter we have developed an approach to the design of robust 
residual-generation processes. We have examined in detail the three basic 
steps of the design methods i) the choice of parity structures in the 
presence of modelling uncertainties, ii) the design of peurity coefficients, 
ii: ) the clioice of parity functions for generating residuals based on 
signature to parity error ratios. 

C 
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The oonceptijalization of analytical redundancy as parity relations 
together with the design method described in the chapter represent a new 
approach to the design of residual-generation processes for PDI. The for- 
mulation of the parity coefficient design problem as a minimax optimization 
provides a basis for ej^loitating analytical redundancy in a robust manner. 
Although the minimax problem is difficult to solve, a sinqple solution 
procedure has been found for some special cases. The insights provided by 
this solution method will aid in the study of the solution if more general 
minim ax problems required to design robust residual-generation schemes. 
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CKAPTER 4 

OPTIMAL SEQUENTIAL DECISION RDLBS 

4.1 The Sequential Decision Probiem-Baekground 

The output of the residual -generation process is the remdon residual 
sequence, r(k) . The behavior of the residual is described by a set of pro- 
bability density functions {p(r(l) , . . . ,r(k) | (1,T,V) ) , k-1,2,...,} that is 
chauracteristic of the presence of the failure (l,x,V) (the notation (l,T,v)" 
(0,-,-) denotes the absence of any failure), and such probability density 
functions represent the signature of the failure. The FDI process involves 
monitoring the residual for changes from its normal (no-fail) behavior. 
Residual san^les ue observed in sequence. If a failure is jv^ged to have 
occurred and sufficient information (from the residual) has been gathered, 
the monitoring process is stopped. Then, based on the past observations of 
residual, an identification of the failure is made. If no failure has oc- 
curred, or if the information gathered is insufficient, monitoring is not 
interrupted so that further residual samples may be observed. The decision 
to interrupt the residual-monitoring to make a failure identification is 
based on a compromise between the speed and accuracy of the detection, and 
the failure identification reflects the design tradeoff among the errors in 
failtire classification. Such a decision mechanism belongs to the extensively 
studied class of sequential tests or sequential decision rules. In this 
research, we will extend existing concepts and formulations of the sequential 
decision problem to the design of decision rules for FDI systems. 

The first important piece of work in sequential analysis was presented 
in 1947 by Wald I lol , where the Sequential Probability Ratio Test (SPRT) %«s 
proposed as a procedure for testing a simple hypothesis against a simple 
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alternative. It wts shown, a year later, by Wald and Vtolfotdtz [18] that 
the SPKT is optimal in the sense that among the class of sequential tests 
that have misrlassification errors not greater than those of the SPRT, the 
SPRT will take the smallest expected number of sauries to reach a decision. 

The conceptual auid structural simplicity of the SPRT has made it the basis of 
many studies in the design and application of sequential tests. For example, 
the SPRT was esployed as a means of identifying aircraft sensor failiures 
[ 11 } , and SPRT- like tests were developed for robust signal detection 1 19] . 

In addition, modifications such as those investigated by Anderson I 20] and 
Chien and Adams [21] have been suggested to enaible the SPRT to deal with 
a wider class of problems in a satisfactory manner. Specifically, Anderson 
has modified the decision thresholds of the SPRT so as to maintain a reason- 
able expected sample size when the same SPRT is used in testing a simple 
hypothesis against a ccn^site alternative, and Chien and Adams have introduced 
resets for the SPRT in order to detect a chemge from one hypothesis to another 
at some unknown time. In the latter case, the change in the hypothesis at 
an unknown time indeed models the occurrence of a (the only possible) failure. 

The decision problem to which the SPRT is the solution is a special emd 
very single case of the general Bayes Sequential Decision Problem (BSDP) first 
studied by Arrow, Blackwell and Girshick and later described by Blackwell and 
Girshick I 22 ] . The BSOP provides a general formulation that can be adapted 
to many meaningful decision problems. In particular, it is suitable for the 
FDI decision problem, where the occurrence of a failure may be regarded as a 
change from the normal (no-fail) hypothesis to seme failure hypothesis at an 
unknown time. The optimal Bayes Sequential Decision Rule (BSDR) has a form 
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sifflilar to that of the dynamic programning solution to the discrete time 
optimal control problems [ 17 ]. Even with the aid of modem computers, 
the BSDR can rarely be calculated for the general prc^lem, and only in some 
limiting cases 1 23] 1 24 ] has some feel for the solution been obtained. 

For this reason, alternative suboptimal sequential procedures were often 
investigated as means of solving sequential decision problesia e.g. , [35 ] 

[26] [ 27]. 

The inclusion of changes among hyiotheses at unknown times furtlMr 
complicates the computational aspect of the BSDR. As noted by Chemoff 
and Backs in their study of estimating a parameter which may change in time 
[28 ], only suboptimal or ad hoc procedures are practical solutions. However, 
for the case where only a charge from a simple hypothesis to a siaq>le al- 
ternative is allowed, some useful results are availidrle. As mentioned earlier, 
Chien and Adams were able to modify the SPRT to acccsmodate this feature. 

In addition, Shiryaev solved the problem in the Bayes formulation [29 j, 
where the Bayes objective used was a means of stating the desire to ad.nimise 
the expected delay to detecting a change while keeping the probability 
of false alarm or the expected nundaer of false alarms before the Grange to 
some fixed value. Such an interpretation of the Bayes formulation suggests 
the usefulness of the BSDP for incorporating the numerous tradeoff issues 
of the decision process in FOI into a single design objective. In this «ray, 
the BSDP provides a conceptually simple design objective for the FDI pr^lem, 
and it has been the basis of our research in the design of sequential de- 
cision rules. Although the optimal solution is impossible to calculate, the 
structure of the Bayes formulation provides a frame%«ork in which sisqtle 
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suboptinal rules can be derived. We shall report our work along this direc- 
tion in the remainder of this chapter and the next. In section 4.2, the 
Bayes formulation of the FDI problem is discussed. The nature of the optimal 
BSDR and the difficulties in its calculation are examinee in Section 4.3 for 
ways of obtaining simple subopt imal rules. Chapter 5 contains the resulting 
approach to designing subopt imal sequential decision rules. 

4.2 The Bayes M>proach for FDI 

In this section we will present a formulation of the FDI decision pro- 
cess as a BSDP, and we will discuss the advantages and limitations of the 
Bayes approach as a tool for designing decision rules for FDI. 

In a sequential decision problem, the decision maker is allowed to 
make a sequence of observations. After each observation, he will decide 
whether to stop sampling and choose a terminal action, or to continue sam- 
pling emd postpone the terminal decision to a later time. Hence the sequen- 
tial decision rule consists of a stopping rule and a terminal decision rule 
which, in the FDI problem, are used to determine when to interrupt the 
residual-monitoring and what failure declaration to make, respectively. Each 
sequential decision rule leads to a particular performance tradeoff determined 
by the Bayes formulation. As we develop the Bayesi 2 m approach to FDI, ve will 
see how the inherent tradeoff issues are incorporated into the formulation. 

The BSDP formulation of the FDI problem consists of six elements: 

1} 0: the set of states of nature or failure hypotheses. An 

element 6 of 9 may denote a single type 1 failure of size V occurring at time 

T (6»(1,t,V)) or the occurrence of a set of failures (possibly simultaneously), 

i.e. 0» {(1-, T, ,V, ),..., (i ,T ,v )}. Although they do not add to the 
111 n n n 
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structural cOTiplexity of the decision problem, multiple failures do increase 
the size of 0 and hence also increase the cbiq^tational burden in the solution 
of the problem. In this study we will focus our attention on single failures 
for simplicity. Moreover, since failuresi* occur infrequently, it is unlikely 
tor failures to occur in rapid succession, t Su>7ficient tisw is generally 
available for detecting and identifying a failure and re-starting the decision 
process before another failure occurs. From this vantage point, single fail- 
ure indeed represent the dominant phenomenon. 

In many applications it suffices to just identify the failure type 
without estimating the failure size. a>en we may consider coagwsite failure 
hypotheses of the form (1,t) - a type i failure of any size occurring at time 
T. Moreover, it is often true that a detection system based on (i,T,V} for 
some appropriate v ceai also detect and identify the type of the failure 
(l,T,v) for v^v. Thus, we may use (i,T,v) to represent (t,T). In the air- 
craft sensor FDI problem [UJ, for instance, excellent results were <d>tained u^ag 
this approach. In situations where it is necessary to estimate v in order 
to identify the failure type or choose a post-failure remedial action, a 
finite grid of failure magnitiules should provide sufficient resolutim. In 
both cases, the failure size can be absorbed in the index 1 so that (i,T) may 
represent a cosg>osite hypothesis or a failure of a certain magnitude. Now 
we have the discrete nature set 

0 ■ {(1,T), i™l,...,M, T"l,2,...,} (4-1) 

vfhere we assiane there are H different failure types of interest and any one 
of them may occur at time T*l, or 2, or ... . 
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2) y: the prior probability mass function (PMP) over the nature 
set 6. This PMF represents the a priori information concerning the failure, 
i.e. how likely it is for each type of failure to occur, and vrhen is a fail- 
ure likely to occur. Because this information may not be available or ac- 
currate in some cases, the need to specify y is a drawback of the Bayes 
approach for such cases. Nevertheless, we will see that it can be regarded 
as a parameter in the design of a BSDR. 

In general, y may be arbitrary. Here, we will ei^loy a special form of 
y. We assume the vinder lying failure process has two properties: i) the N 

failures of 0 are independent of one another, and ii) the occurrence of each 
failure i is a Bernoulli process with (success) parameter p^. The Bernoulli 
process (corresponding to the Poisson process in continuous time) is a com- 
mon model for failures in physical components [ 30 ] ; the independence 
assumption describes a large class of failures (such as sensor failures) 
while providing a sin^>le approximation for the others. In this framework, 
the set 6 consisting of single failures only represents a subset, albeit a 
dominant one, of the exhaustive events defined over all possible outcomes 
including multiple failures. Hore precisely, 9 only describes the eirrival 
of the first failure. Hence the PMF defined over 6 is a conditional PNF- 
y(l,T) is the conditional probability that a failure will occur at time T 
and that the failure will be of type i, given that this failure is the first 
ever to occur. Using the preceding reasoning, it is straightforward to show 
that 

y(i,T) « a(l)p(l-p)^"^ i-l,...,M, T-1,2,..., (4-2) 


-104> 


where 


N 

p ■ 1 - n (i-p.) 

j-i ^ 


0 ( 1 ) 


pi(i-Pi> 


I I p.(i-pj“^i‘^ 
j-i ^ 3 


(4-3) 


(4-4) 


The parameter p may be regarded as the parameter of the combined (Bernmilli) 
failure process - the occurrence of a (any) failure; 0(1) can be Interpreted 
as the marginal probability that the first failure is of type 1. !lote that 
(4-2) indicates the arrival of the first failure is ra^noryless, l.e. 

_ T - T -1 

yd.xino failure before T ) = o(i)p(l-p) 

° (4-5) 

- y(i,T-TQ), i-l,...,M T>Tq 

This property will be useful in obtaining tiaie-lnvariant suboptlmal decision 
rules (see Chapter 5) . 

3) P(k) : the discrete set of terminal actions (failure identifications) 

available to the decision maker when the residual-monitoring is stopped at 
time k. kn element 6 of P(k) may denote the pair (j^t), i.e. declaration of a 
type j failure to have occurred at time t. Alternatively/ d may r^praaent an iden- 
tification of the j-th failure type without regard for the failure time 
( 6 ■■ (j/t))/ or it may signify the presence of a failure without 

specification of its type or tlsie, l.e. siBg>ly an alarm 
Since the purpose of FDI is to detect and identify failures that have oc- 
curred/ P(k) should only contain identiflcatiom that either specify failure 
times at or before k/ or do not specify any failure time. As a result/ the 
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nuBdMr of temlnal decisions specifying failures tisMS grow with k tnhile 
the nvBsber of decisions not specifying any time will resMln the same. 

In addition, V[k) does not include the declaration of no failure, since the 
residual-monitoring is stopped only when a failuie a^ears to have occurred. 
By continuing the residual -monitoring, it is understood that no (or insuf- 
ficient) evidence has been gathered to declare any failure. 

4) L(k;6,6)i the terminal decision cost (loss) function at time k 

defined over 6 x P(k). L(k;0,6) denotes the penalty for deciding £d)(k) 

at time k when the true state of nature is 6 • (i,T). It is assumed tc be 
bounded and non-negative and have the structure: 


'.(k;(i,T),6) 


La,x),6) 


T<k, 6eP(k) 
T>k 6eP(k) 


(4-61 


where L(6,£) is the underlying cost function that is independent of k; 
denotes the penalty for a false alarm, and it may be generalised to be 
dependent on 6. It is only meaningful for a terminal action 
(identification) that indicates the correct failure (and/or tisw) to receive 
a lower decision cost than one that indicates the wrong failure (and/or time) . 
We further assume that the penalty due to an incorrect identification of 
the failure time is only dependent on the error of such an identification. 
That is for 6« (j,t) , 


L((l,T),(j,t)) » L(i,j,(t-T)) 
and for 6 wltii no time specification 


(4-7a) 


L((i,T),6) • L(i,«) 


t4.7b) 
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Clsarly L provides th« mam for poMliElng th« vAriou* ecou-d«t«etlon« 
Accordii^ to how ondoslratolo o«oh of tha« ia. 

5) r(k)t tha residual (cd>servation) sequence. (Me shall use rOc) 
to denote both the rmdon variable and its value, but the meaning will be 
clear from the context.) Me shall assvsne r(lc)€ R^. The residual samples 
need not be independent and identically distributed in general, but their 
joint distribution is dependent on 6 and is assumed to be known. Me shall 
let p(r(l) , . .. ,r(k) I (i,T)} denote their joint conditional density. 

Assming that tlM residual is affected by the failure in a causal manner, 
its conditional density has the property 


p(r(l),...,r(k)l (i,T)) - p{r(l),...,r(k)| (0,-)) 

i-J , . . . ,H, T>k 


(4-8) 


where (0,-) is used to denote the no-fail condition. In this research, we 
further assume that the residual is an independent Gaussian sequent with 
tijse- independent covariance function V(sB(m matrix), i.e. 


k 

p(r(l),...,r(k)l(i,T) - II P(r(s) 1 (i,T)) (4-9a) 

S-1 


p(r(k)l (i,T)) 



I I(r(k)-q(k»i,T))’v‘ 
x(r(k)-g(J<,i,T))l) 


(4-9b) 


where g(k;i,T) is the mean of the residual given that the failure (i,T) has 
occurred. With the covariar^e assumed to be the same for all failures, the 
b«an ftinction g(kil,T), characterizes the effect of the failure (i,t), and 
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it is hanoaforth callad tha signature of (i,T), with g(k|0,-)>0 for tha 
no-fail condition. For time-invariant systems g(k;i«T) bacomas 


g(k)i,T) 




0 


k>T 

k<T 


(4-10) 


tfa have chosen to study residuals of the form (4-9) and (4-10) Isecause 
their wpacial structure facilitates the develt^sMnt of insights into tha 
dasign of dacision rulas. Moreover, the Gaussian assvm^ion is reasonable 
in many problems and has met with success in a wide variety of aj^licatlons, 
e.g. , ( 5] {11]. It should be noted that the use of more general proba- 
bility densities Sac the residual, e.g. time-dependent V and signatures 
that depend on both k and T (g(i,k,T)), will i^t invalidate the discussions 
in Section 4.3. The single signature (4-9) aM (4-10) considered here will 
facilitate the design of suboptimal rules (sea Chapter 5) . 

6 ) w():,(i,T)}i the delay cost function naving the properties: 

w(k, (i,T)) • j w(i,k-T) >0 T<k (4-lOa) 

^ 0, T>k 

w(i,k -T)> w(i,k_-T) k, > k„ >T (4-lOb) 

1 2 1 * 

After a failure has occurred at T, there is a penalty for delaying the 

i 

terminal decision uncil » k>T with the penalty an increasing ftmction 
of the delay (k-T) . In the absence of a failure, no penalty is imposed on 
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tn« Milling. In this study ws will consider a daisy cost function that 
is linear in the delay, i.e. 

jg(iHk-T) T<k 

c(i,k-T) - J (4-11) 

I 0 T>k 

where c(i) is a positive function of the failure type i, and may be used to 
provide different delay penalties for different types of failures. 

We have described the setting cf the BSDP for the FDI decision process. 
The most ia^rtant feature presented is the tradeoff structure provided by 
the terminal decision and delay cost functions. Generally, the more obser- 
vations the decision maker has, the more certain he is about the true state 
of nature, and this will lead to a lower expected terminal decision cost 
«diich is due to false alarms and incorrect detections. CM the other hand, 
he is penalized for the delay Ir: dscisAon that results from taking more 
observations. Hence, the cost functions L and w together form the ba. is for 
ucmsiderlng the tradeoffs among the various performance issues (detection 
delays, false alarms, etc.) simultaneously when the design objective is to 
Biinimize the total expected cost. Now proceed to characterize sequentla. 
decision rules for minimizing the total cost, employing the approach of 
Ferguson 131] . 

A sequential decisitm rule naturally consists of two t .ts: a stopping 

rule (or sai^ling plan) and a terminal decision rule. The stopping rule, 
denoted by (0(0) ,^(l;r (1) ) , . . . ,4>(k; r(l) , . . . ,r (k) ) , . . . ) is a sequence of 

iunctions of the observed residual samples, with 0(k}r (1) , . . . ,r(k) )■!, or 0. 
When 0(k;r(l) , . . . ,r (k) )■!, (0), residual-monitoring or sai^ling is stopped 
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(continued) after the k residual sanples, r(l) , . . . ,r (k) are observed. 
Alternatively, the stopping rule may be defined by another sequence of func- 
tions = (ij<(0),^(l;r(l)),...,<f»{k;r(l) ,...,r(k)),...), where 
ij)(k;r(l) , . . . ,r(k) )«1 (0) indicates that residual -monitoring has been carried 
on up to and including time (k-1) and will (not) be stopped after time k 
tdien residual sanq>les, r (1) , . . . ,r (k) are observed. The functions $ and ¥ 
are related to each other in the following way 

k-1 

^(k;r(l),...,r(k)) = $(k;r(l) , . . . ,r (k) ) II ll-4>(s,r{l) , . . . ,r (s) ) J 

S=0 (4-12) 

with lj<(0) = 4>(0). The conditional probability of stopping at time k, given 
that the true state of nature is (i,i) , is 


E, ^ii<(k;r(l),... ,r(k)) 




i|)(k,r(D, 


m 

R X. 


_m 


. .xR 


,r(k) )p(r(l) , . . . ,r(k) ]i,T)dr(l) . ..dr(k) 

(4-13) 


and the probability, P^(i,T), of eventually stopping given 0 = (i,T) is 


P (i,T) 
s 


\ E. J»(k;r(l),...,r(k)) = 1 
k=0 


(4-14) 


If Pg(i,T) f 1 for all (i,T)e 0, it is possible for the sampling to go on 
indefinitely even in the presence of a faiJ.ture, and the expected delay cost 


will be infinite. Therefore, only stopping rules will P (i,T) * 1 cure 

s 

meaningful . 
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The terminal decision rule is a sequence of fxinctions, 

D * (d(0) ,d(l;r (1) ) , . . . ,d(k;r (1) , . . . ,r (k)), • . . ) , mapping residual sauries, 
r(l), — ,r(k) into the terminal action set P(k). The fuitction 
d(k;r (1) , . . . ,r(k)) represents the decision rule used to arrive at em action 
(identification) if sampling is stopped at time k and the residual samples, 
r(l), — ,r(k) are observed. Actually, D only needs to be defined for those 
r(l) , . . . ,r (k) for which l{)(k;r(l), — ,r(k)) - 1. But it will become clear 
that it is useful to consider terminal decisi >n rules independently of the 
stopping rule . 

The PDI sequential decision procedure consists of two steps. According 
to the sampling plan, the decision maker determines if he is to continue 
the residual-monitoring. If he is to stop, he makes a failure identification 
according to the terminal decision rule. As a result of using the sequential 
decision rule (^>,D) , given (i,T) is the true state of nature, the total ex- 
pected cost is; 


UQlti,T) , {$,0)1 


I E. {i|.'(k;r(l) ,...,r(k))[c{k,(i,T)) + 
k=0 

L(k; (i.T) ,d (k;r (1) , — ,r (k) ))] } 


(4-15) 


The BSDP is defined asr determine a sequential decision rule ($*,D*) so that 
the sequential Bayes risk is minimized, where 


U ($,D) = En [(i,T) , ($,D)] 
s 0 


M o" 

I I y(i,T)Unni»T) , ($,D)] 
i=l T=1 


(4-16) 


($*,D*) is called the Bayes Sequential Decision Rule (BSDR) with respect to 
y, and it is optimal in the sense that it minimizes the sequential Bayes risk. 


ORIGINAL PAQg K 
OE POOR QUALITY 
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r?he BSDR can be determined by carrying out the minimization of (4-15) in 
two steps: first with respect to 0, then with respect to While we post- 
pone further discussions of the BSDR to the next section, we proceed to 
examine the Bayes risk closely for a better understanding of the BSDP. 

Using (4-6) and (4-11) , we can re-write Uq as 


T-1 

U.I(i,T),($,D)] = L I E. {i{'(k;r(l),...,r(k))) 

° ^ k=0 

00 

+ c(i) J I(k-T)E. {ip(k;r(l),...,r(k))}] 
1 - 


k=T 


+ I fE. {i/)(k;ra),...,r(k))L|L,T),d(k;r(l),...,r(k)))}] 
k=T ' 

^ (4-16) 


In the following we will describe a special interpretation of the sequential 
risk for the FDI problem. Let us define the following notation 


T-1 


~ 1 _ip(k;r (1) , . . . ,r (k) ) 
^ k=l 


(4-17) 


P = U P(k) 

k=0 


(4-18) 


S(k,6)={ [r(l),..,r(k)]:ii)(k;r(D,...,r(k)=l,d(k,r(l),...,r(k))=6}, 

SeV 


? {S(k,5) |i,x} = J[ p(r(l),...,r(k) |i,T)dr(l)...dr 
S.(k,6) 


(4-19) 
(k) (4-20) 
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where Pp(x) is the probability of stopping to declare a failure before the 
failure occurs at T, i.e. the probability of false alarm «Aien a failure 
occurs at time T; V is the set of terminal actions for all times; S(k»6) is 
the region in the sample space of the first k residuals where the sequential 
xrule ($,D) yieldsthe terminal decision 6. Clearly, theS(k,6)'s are 
disjoint sets with respect to both k and 5, and we note, that 


E. T|^(k;r(l),...,r(k)) = I P {s (k,a) | i,x} 

5eP(k) "" 

Then (4-15) can be expressed as 


(4-21) 


UQl(i,T),($,D)]=LpP^(T) + (l-Pp(T)){c(i) I I(k-T) (l-Pp(T))‘V .^k,r(l) , . . ,r (k) ) J 


+ I L((i,T),6) I [P {S(k,6)Ii,T}(l-P_) 

6eP k«T "" ^ 

(4-22) 


By (4-13), (1-P (T)) ^E. ip(k;r (1) , . . . ,r(k) ) for k>T is the conditional 
F 1,T 

probability that residual-sampling will be stopped at time k, given a type i 
failure has occurred at time T and the saunpling process has not been stopped 
before the failure occurred (i.e. no false alarm), and (4-21) then tedces the 
form 


UoI(i»T),($,D)]=bpP^(T)+(l-Pp(T)) [C(i)t(i,T)+ 


I 

SeV 


L((i,T),6)P((i,T),6)] 

(4-23) 


where 
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t(i,T) = 


k-x 


(k-T) (1-P„(T)) 


E. i|;(k;r(l) 
1»T 


,r(k)) 


(4-24) 


P((i,x),6) - I P {S(k,6) !i,T}(l-P_) 
k-T ^ 


-1 


(4-25) 


The expression t(i,x) is the conditional expected delay in decision (itS. 
stopping sampling and making a failure identification) , given a type i 
failure has occurred at time x and no false alarm has been signalled Isefore 
this time. Similarly, P((i,x),6) is the conditional prol>ability of even- 
tually declaring 5, given an i-th type failure has occurred at time T and 
no false aleunn has oeen signalled — P((i,x),6) is the generalized cross- 
detection probability. Finally, the sequential Bayes risk u can be vnritten 

s 

as 

M “> 

n(t.D) = I J u<i,T){l P.CT)+(l-P_'t))[c(i)t(i,T)+ J U (i,T) ,«)P( U,T) ,S)) 

(4-26) 


Equation (4-26) indicates that the sequential Bayes risk is a 
wei ''°d combination of the conditional false alarm probability, expected 
delay to decision and cross-detection probabilities, and the optimal se- 
quential rule (9* ,D*) minimizes such a combination. From this vantage 
point, the cost functions (L and c ) and the prior distribution (p) provide 
for the weighting, hence, a Ijasis for indirectly specifying the tradeoff 
relationships aiDong the various performance issues. The advantage of the 
indirect approach is that only the total expected cost instead of every 
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indivldual performance issue needs to be considered explicitly in designing 
a sequential rule. The drawback of the approach, however, lies in the 
choice of a set of appropriate cost functions (and scxaetimes the prior 
distribution) ^en the physical problem does not have a natural set, as it 
doesn't in general. In this case, the Bayes approach is most useful with 
the cost functions (and the prior distribution) considered as design para- 
meters that may be adjusted to obtain em acceptable design. 


4.3 The Bayes Sequential Decision Rule 

In this section, we will describe the optimal solution to the BSDP. 
Before we do that, it is instructive to examine the (unconditional) ex- 
pected delay and terminal decision cost at time k, U(k), for a terminal 
action 6eP(k) 

M “ 

U(k) = I J Ic(k; (i,T)) + L(k;(i,T),5)]p(i,T) 
i=l T=1 


M k M » 

■II [C(i;(k-T) L((i,T) ,6)]p(i,T) + I I P(j»t) 

i=l T=1 ^ j-1 t-k+1 


M k 

- I I [c(i,k-T) + L((i,T),6)]u’(k;i,T) + L^'(k;0,-) 
i=l T-1 


where the first equality follows from the definitions, the second one is 
a direct ccmsequence of (4-6) and (4-10) , and the last one follows from 
the notation 

! y(i,T) i»l,...,M, T-l,...,k 

M ® 

I I y(j»t) 

j-l, t-k+l 


i-0 


(4-27) 
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where (0,-), i.e. 1-0 is an artificial failure state representing the event 

of a failure occurring after the terminal decision tine k. Alternatively 

(0,-) may be viewed as the no-fall state that has a dwindling prior pro- 

bed>llity as tine progresses. Clearly, y* Is a tine-dependent PMF, and it 

effectively defines a growing nature set, 0(k) - {(1,T), i-0,l,...,M, T-l,..»k}, 

that corresponds to the increasing number of possible failuie tines. For 

fallure-nonltorlng over the Interval [l,kQLy* (k^>0,-) denotes the prdbabillty 

of no failure over the Interval with (0,-) as the no-fall state. Since 

they can be used Interchangeably in the calculation of the expected cost, 

(6,y) and (0(k),y'(k)) are equivalent for the BSDP. But the latter will 

r 

be used, due to the resulting simplification In notation. We now proceed to 
describe the BSDR, 

It is clear that the expected delay cost Is Independent of what temlnal 
decision Is made. Thus, once sanpling Is stopped the optimal tenolnal 
decision rule must be that which mlninlzes the oq>ected decision cost. This 
Inplles that the Bayes terminal decision rule D* is independent of the 
stopping rule and is a sequence of flxed-san^le-slze (FSS) Bayes rules [32] . 
Therefore, D* - (d* (0) ,d* (l;r(l) ) , . . . ,d* (k;r(l) , . . . ,r (k) ) , . . .) where d*(l;) 
is the k-sample Bayes rule (with respect to y* (k)) that minimises the FSS 
Bayes risk: 


/ N k 

I I y' (kji,T)L((i,T) ,d(k))p(r(l),..,r{k) |i,T)dra)...dr(k) 
1-0 T-1 /A 


(4-28) 


R X. .X R 


Hence, the k-sanple Bayes rule d*(k) also mlninlses the Integrand of (4-28), 
By single manipulations, the Bayes rule can be expressed in terms of the 
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likelihood ratios A or the posterior probabilities q of the nature states 
given the residual 8anq>les« r (1) , . . . ,r (k) [32 ] .' 


N k 

d*(k) • arg min J J L((i,T) ,d(k) )y' (k>i,T)A(k>i,T) (4-29) 

d(k) i-0 T-1 

or 

M k 

d*(k) - arg min J L( (i,T) ,d(k) )q(i,x|k) (4-30) 

d(k) i-0 T«1 


where, assuming p(r(l) , . . . ,r(k) |o,-) 0, 


A(k;i,T) 


P(r(l) ,...,r(k) li.T) 
p(r(l) , . . . ,r(k) |0,-) 


and 


q(i,x|k) - 


U' (k;i,X)p(r(l) , . . . ,r(k) |i, r) 

M k 

\ \ y (k;j,t)p(r(l),...,r(k)lj,t) 

j-0 t-1 


(4-31) 


(4-32) 


The Bayes rule given in (4-29) is in the form of a likelihood ratio test. 

In some cases, it is more convenient to work with the log likelihood ratios 

Itk;i,T): 

L(k;i,X) - in A(k;i,X) (4-33) 

and (4-29) can be transformed accordingly. In general, the PSS decision 
rule divides the residual sample space into terminal decision regions 
[T(k,5), k-1,2,...} such that d(k;r(l) , . . . ,r (k) )-6eP(k) if 
(r(l) , , . . ,r(X9e T(k,6), Then (4-28) can be re-written as 
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M k 

lT(d(k)) - I Ju'(k,i,T) I L((i,T),6)Q(kji,T,6) (4-34) 

i«0 T*1 66P (k) 

where 

Q(kii,T,6) - rp(r(l),...,r(k))dr(l),...,dr(k) (4-35) 

T(k,6) 

Note that 

I Q(k}i,T,6) - 1 (4-36) 

aeP(k) 

The quantity Q(k;l,T,£) has the Interpretation of a k-sanqple detection 
probability (i,e. prob^d^ility of deciding 6 based on k observations when 
(ifT) is true) of which the probabilities of cross detection# correct 
detection, etc. of the k-sanple decision problem are special cases. From 
(4-34) we see that the Bayes rule d*(k) minimizes a linear combination of 
the detection probabilities where the decision cost function and the 
prior probad>ility y' play the role of weights. 

Now we will turn our attention to the optimal stopping rule. First we 
will consider stopping rules that terminate saiqpling at or before time k>0, 
i.e. 

K 

I ij)(k, r(l) r(k)) - 1 (4-37) 

k-1 

A sequential decision problem using such a rule is said to be truncated at 
time k. Then, the optimal stopping rule for the non-truncated prc&lem can 
be obtained from the optimal truncated rule by letting K**«b. The optimal 
truncated rule can be determined via a straightforward application of the 
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prlnciple of optiaallty [33]* and we will state the result (tdiioh is 
contained in [22] and [31], for exanple) in the following. 

Let us define J(k) to be the total expected cost of stopping at tisM k 
and applying the optinal terminal decision rule d*(k), given that 
r(l),...,r(k) have been observed! 


M k 

J(k) - I X qOt»i#T|k) [c(i) (k-T)+L(i,T) ,6*)] 
i-1 T-l 

+ q(kjO,-|k)Lp 

where 


(4-38) 


6* - d*(k;r{l),...,r(k)) 


(4-39) 


I I y'(k!j,t)p(r(l) r(k)|j,t) 

j»l t-l 


(4-40) 


Ihe minimum expected cost-to-go at time k# <lj^(k) , for the decision problem 
truncated at K is given by 


Jj,(k-1) - min[J(k-l),E{j(k) |r(l),...,r(k)}], k-l,...,K-l (4-41a) 


3fj^(K) - J(K) 

Note that both J(k) and 3j^(k) are functions of r(l) ,...>r(k) . 


(4-41b) 


* The principal of optimality: "An optimal policy has property that 
whatever the initial state and initial decision are, the remaining deeisi<ms 
must constitute an optimal policy with regard to the state resulting from the 
first decision." 
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7h« optiaal stewing rule for the truncated pc^laa la 
•J - (♦Ja}ra)),...,^J(k;r(l),...,r(k)),...) 

•dtere 

j 1 if J(k)< E{j(k+l)|r(l),..,r(k)} 

♦•Ckfr(l),...,r(k)) - I 

' 0 otherwise (4-42a) 

♦J(K|r(l),...,r(K)) - 1 (4-42b) 

is, sampling is terminated at time k if the total Mnditional a 3 Q>ected 
cost (J(k)) of stopping at k to use the optimum terminal decision rule# given 
that r(l),...,r(k) have been observed# is lower than that 

•••#r(k}}) of taking an additional saiqple and using the optimum 
sequential rule from them on. Ihe tern 3fj^(k) is the total expected cost of 
using the optimal stopping rule (fm: the problem truncated at K) and the 
optiauB terminal decision rule at times k# k^l,...# and k# given that 
r (1) # . . . #r (k) have been observed. Hence# it is called the optimal ei^ected- 
cost-to*go at k. nie sequential Bayes risk U^(t*#D*) for the truncated 
problem is sisq>ly 

0^(*J#D*) - 3fj^(0)# k-0,1#..# (4-43) 

and wn will let J^(0) denote the (finite) Bayes risk U^(f*#D*) that is 
associated with the optimal non-truncated sequential rule i.e. the 


BSDR. 
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ItM Mt Of all stopping rulas that tsrainats sanpling at car bafora 
tiM k contains tha sat ci all stof^ng rulas that taxninatr.: sanpling at or 
bafora tisw k-1. lharafora> wa hava tha saquanea of inaqualitias 

J^(0) > ... > 3,j<0)> Jj^^^(0)> ... (4-44) 

Dua to tha fact that tha tarainal dacisimt cost function L is boundad, it 
can ba shown that Jjr(O) J^(0) as K-*« (sea ^>pandix A). Oonsaquuitly, the 
optiaal sequenflal rule for the nmi-truncated problas is tha lisdt of 

(0*,O*) as X***, and the foxaer can be approxiaatad by tha latter for a suf- 
ficiently large X. 

Note that the detamination of tha optiaal truncated st^ipii^ mla 
requires solving (4-41) backwards in tiaa. In fact, (4-41) and (4-42) des- 
cribe a dyiuaiic prograasming problea for which tha solution I % axtresaly dif- 
ficult to calculate dua to the incmnse storage and conputation required [17]. 
Consequently, tha optimal stoi^ing rule is generally is^possibla to ocn^ta, 
and suboptiaal rul'.s must ba used. 

Despite tha iaq^actical nature of its solution, the BSOP provides a 

useful framework for designing suboptimal decision rules for tha FDI prt^laB 

because of its inherent characteristic of explicitly weighing tha tradeoffs 

between dataotion spaed and accuracy (in terms of the cost structure) . In 

the ^avious section va saw that a sequential decision rule defines a sat of 

sequMitlal decision regions s(k,o), and tha decision regions corresponding to 

* Since the posterior probabilities q, the likelihood ratios A, and the log 
likelihood ratios X are alternative sets of sufficient statistics, it is easy 
to see that a sequential rule also defines sequential decision regions in the 
space of each of these set of decision statistics. 
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th« B8M yield the ainina risk. Pxooi this ventage point, the design of e 
suboptiJMl rule can be viewed as the pc<d»lem of ohoosing a set of ieeisioa 
regions that would yield a reasonably snail risk. This is the essanea of 
tha approach to suboptiaal nle design that we will describe in Chapter 5. 
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CHAPTER 5 

SUBOTTIMAL SEQUEWTIAL DECISIOW RULE S 

Prom the previous chapter we see that the Bayee formulation of the 
PDZ decision prdhiem is a suitable <me because of its built-in performance 
tradeoff structure. Although the optimal rule (the BSDR) is cowpatationally 
intractable* practical* suboptimal rules with good perforsMuice may be de- 
termined using the Bayes framework. This chapter is devoted to the discus- 
sion of our approach to the design of such suboptimal rules for PDI. Ifhile 
it covers a wide range of issues* this discussion* by far* does not exhaust 
all possibilities. Rather* it will serve to demonstrate how this fraamwork 
can be useful for f'he systematic approach to decision rule designs. 

In Section 5.J we will first examine an approximation sch«se for the 
BSDR that is directed at alleviating its overw’ielming con^tational require- 
ments. The resulting simplified decision rule will provide the basic form 
for a range of suboptima', rules of varying degrees of complexity. As we 
describe these ruleswe will also examine their computational structure so as 
to assess their practicality. In Secti<»i 5.2 we will consider the risk 
evaluation for some simple suboptimal rules. An algorithm for approximating 
the risk will also be described. The choice of design parameters and the 
risk •minimisation procedure will be diset^ssed in Section 5.3, and a siiimai ji of 
the design methodology described in Section 5.1 - 5.3 is included in Section 
5.4. Our experience with this decision rule design methodology through the 
study of num^ricsl exsiqples and simulations is reported in the next chapter. 
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5.1 Suboptimal Rules Based on the BSOR 

5.1.1 'nie Sliding Window Approximation 

nte ianense computation associated with the BSDR is partly due to 
the increasing number of failure hypotheses as time progresses. In fact, 
this phenomenon (often called the "growing-bank-of-filters") is coimon to 
detection schemes, such as the GLR [5 ], where the failure time is ex- 
plicitly taken into account in the failure hypotheses. The remedy for 
the problem studied here, as in t 5 ] for instance, is the use of a 
sliding window to limit the number of failure hypotheses to be considered 
at each time. Ihe application of the sliding window approximation to the 
BSDR for the infinite time horizon problem yields a sequential decision rule 
that uses only a sliding window of a fixed number of residual samples. This 
brings a saving in the storage of residual sarnies, as the BSDR, in contrast 
requires all past sauries. Furthermore, such a sequential rule is indepen- 
dent of time after a window-full of residual samples has been gathered, 
while the BSDR is a time-dependent rule. Because of these desirable sin®)li- 
fications, the sliding window scheme has become the backbone of our study 
of the design of suboptimal rules. We now proceed to describe this approxi- 
mation scheme. 

The only assumption made under the sliding window approximation is 
that essentially all failures can be detected within W time steps after they 
have occurred, or that if a failure is not detected within this time it will 
not be detected in the future. Thus, when we have progressed to time k vre 
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can be quite confident that nc failure has occurred before tiB« k-H41, 
and we only need to consider the possibility of a failure occurring at 
soiae T^k-W+1. Consequently we have modified the nature set and terminal 
decision set at tine k to be 0 (k) and p (k) , respectively: 

^ (k) = {(i,T), i=l,...,M, T^k-W+l}, and p**(k) contains all the elements 
of P(k) that iher specify failure times in the interval [k-W+l,k], or 
do not specify any time at all. The prior probability mass function 

w 

11 (k;i,T) defined over 0 (k) may be regarded as the conditional probability 

that a type i failure will occur at time T^k-W+1, given that rw failure has 

— ^ 

occurred before k-W+1. Using the memory less nature of y (see Section 4.2),^* 
can be easily shown to be 

( 0 T<k-W+1 

H (k;i,T) = ; 

( a(i)pa-p)'^"’''^"^ T>k-W+1 

For kj^, k^ ^ W, the triplet {0*(k) , u'^(k) , p'^(k) } for k=kj^ is clearly a 
time-shifted version of that for convenient to define a new 

time variable T that is related to the failure time T by T = k-T,* T has the 
interpretation as the failure time relative to the decision time k, i.e. 
a positive (negative) T indicates a failure time that is |t| step before 
• (after) k. Using this notation, at each time k, we have the natxire set, 
prior probability mass function, and the terminal decision set in the 


following forms: 
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e” = {(i,T), t=W-1,W-2,...,} (5-1) 

0 

M— T— 1 

a(i)pd-p)” ^ 

w 

V = {6 with time specifications relative to k that (5-3) 

are not earlier than k-W+l} 

Let \is recall three useful properties of the BSDP under investigation; 

i) the only time dependence of the failure signatures is manifested 
through the dependence on the elapsed tiroes since the onset of failures, 

ii) the cost of incorrect failure time identification is a function of 

the difference between the true and declared times, and iii) the decision 

delay cost is proportional to the delay. It is now clear that the terminal 

decision problem at different times beyond W are time-shifted versions of 

^ ^ ^ 

the problem defined by {0 , y , P , L, g}. Consequently the terminal de- 

w 

cision rule d mapping the sliding window of residuals [r (k-W+l) ,r(k) ] 

w 

into p is a W-sample (Bayes) rule that is the same for all k^W. Similarly, 

the stopping problems encountered at different stages of the sequential 

WWW 

decision process are defined by the same six elements: {0 , y , p , L, W, g}, 

w 

c»nd the stopping rule 4> defined over the san^le space of the sliding window 
of residuals [r(k-W+l) , . . . ,r(k) ] is the same for all k5^. Therefore, the 

w w 

sequential rule (<|» ,d ) is much sinpler than the BSDR which consists of a 


-W,. 

y (i,T) = 


T<W 


(5-2) 
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inf ini te sequence of time dependent rules. In addition, only a finite 

H H 

Storage is required by ,d ) for the window of data as opposed to the 
growing storage needed by the BSDR. Because of these desirable features, 
the sliding window approximation will be the basis of our suboptimal rule 
designs. 

w w 

since the sequential rule (4> ,d ) uses a sliding window of residuals 
(hence called a "sliding window sequential decision rule") , it requires 
mandatory scuiqpling through the initial W steps in order to fill up the data 
window. One minor dravA>ack of such a feature is increased delays in detec- 
ting failures occurring within the first W time steps. Fortunately, all 
practical window sizes are reasonably small so that the probability of a failure 
occurring within the first W time steps is negligible, and the above mentioned 
detection delays will not have any significant impact on the overall perfor- 
mance of the sliding window rule. 

A more important design aspect introduced by the use of a window rule 
is the tradeoff between detection performance and computational complexity. 

A window rule with a long window is more likely to deliver good detection 
accuracy them one with a short window, because with a long window, more data 
is used and more possible failure times can be considered. But on the other 
hand, a long window rule requires more conputations for both the off-line 
performance evaluation during the design process and the on-line processing 
of the window of data to generate the decisions. From our vantage point, the 
window size W is considered along with the prior probability n and the cost 
functions L 2 md w as design parameters within the Bayes formulation that may 
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be adjusted to achieve a satisfactory sliding window decision rule. This 

will be discussed further in subsection (5.3.2.) The Bayes design problem 

W W 

now becomes: for a set of y, W, L and c, find a ,d ) that minimizes 

w w 

the sequential risk u «^ ) • it stands, the solution of this problem 

s 

still re(]uires a tremendous amount of computation, albeit much less than 
that required for the BSDR. In the next sxibsection we will examine the 
computational structure associated with the risk evaluation for sliding 
window rules in order to indicate now how further simplifications may be 
introduced. 

5.1.2 Sliding Window Sequential Decision Rules 

w w 

Similcu: to the BSDR, the window rule (4> ,d ) divides the sample space 
of the sliding window of residuals, or equivalently, the space of vectors 
of posterior probabilities (q) , likelihood ratios (A) , or log likelihood 

•V 

ratios (L) of the sliding window of failure hypotheses into disjoint 

sequential decision regions {Sq,Sj^, the residuals are 

assumed to be Gaussian variables, the log likelihood ratios are simpler to 

work with than the likelihood ratios or the posterior prob^lbilities. We 

will only use the log likelihood ratios as the decision statistics. Now 

w 

suppose there are N elements in V and these elements are indexed such that 
H 

V = {s, ,...,S„}. In terms of the sequential decision regions defined in 
1 N 

*» 

the L-space, the sliding window rule states: At each time k^W, we form the 
decision statistics L(k) from the window of residual samples. If L(k)es^, 
for i=l,.., or N, we will stop saiqpling to declare 6^; othex>/ise, L(k)es^, 


euid we will proceed to take one more observation of the residual. The 
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Bayes design problem is to determine a set of regions {s*/SS . . . ,S*} 

OX N 

w 

that minimizes the sequential risk U ({s. }), i.e. 

S 1 

{s*} » arg min u'^CCs.}) (5-4) 

^ {S.} ® ^ 

Expression (5-4) represents a functional minimization problem for which a 
solution is generally very difficult to determine. A simpler alternative 
to this problem is to constrain the decision regions to take on special 
shapes, {s^(f) } , that are parameterized by a fixed dimensional vector, f , 
of design variables. Then the resulting design problem involves the 
determination of a set of parameter values f* that minimizes the risk 

Ug({s.(f)}) 

f* = arg min 0**({s.(f)}) (5-5) 

f S X 

In this study, we will focus our attention on a special set of para- 
metrized sequential decision regions, because they are slnple and they 
serve well to illustrate that the Bayes formulation can be exploited, in a 
systematic fashion, to obtain simple suboptimal rules that are capable of 
delivering good perfonumce. Next, we shall describe this set of siirple 
decision regions. 

The window of failure hypotheses consists of 
0 = {(i,T), i-0,l,...,M, t»W- 1,W-2, . . . } , where (0,-) denotes the hypothesis 
of no failure in the window. Suppose the terminal decision set is of the 
form V = {(j,t) , t»0,...,W-l} with (j,t) corresponding to 
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declarlng a type j failure occurring at time k-t. The sequential decision 
regions vre will study are of the form: 

S(j,t) - (Uk) : 

Uk; j ,t) >f( j ,t) 

e"^(j,t) (L(k;j,t)-f{j,t)]>e"^(i,T) IL(k;i,T)-f(i,T)], 

(i,T)j!<(j,t)} (5-6a) 

S(0,-) - (Uk) ; L(i,T)<f(i,T) , i=l,...,M, T=0,...,W-1} (5-6b) 

where E(k) « [L(k; 1,0) , , . , ,i(k;M,W-l) ] ' ; S(j,t) is the stop-to-declare- 

(j,t) region, emd S(0,-) is the continue region (see Figure 5-1 for an 

illustration in two dimensions) . Note that this set of decision regions 

may be easily modified to accomodate the case where V has some of its 

elements replaced by a composite decision, e.g., if { ( j,0) , . . , ( j ,W+1) } 

, W-1 _ 

IS replaced by o = _U (j,t) (i.e. declaring a type j failure without 
t“0 

regard of the failure time) , we have the stop-to-declare-6 region 
W-1 _ 

S(5) = _U S(j,t). In (5-6) the f's are known as the decision thresholds, 
t=0 

and the c's are the normalization constants. (As shown in Figure 5-1 for 
the 2-dimensional case, the e's determine the slope of the boundary between 
two stopping regions). Generally, the e's may be regarded as design peura- 
meters along with the f’s. In this study €(i,T) is sing>ly tzdcen to be the 
standard deviation of L(kji,T). 

Recall that the residual saunples are Gaussian variables. Then the log 
likelihood ratio L(k;i,x) is given by: 





-131 


T T 

L(k,i,T) - I g'(s)v"^r(k-T+8) - 5 I g!(s)v“^g (s) <5-7) 

8-0 ^ 8-0 

Since the second term in (5-7) is a function of i and T and is constant 
for all L(k;i,T) , k-W, W-i-1,..., it may be absorbed into f(i,T) in the 
definition of the decision regions. As a result* the decision region may 
be re-defined in terms of a set of new decision statistics, L(k) s 

S(j,t) - U(k) : 

i(j,t)>f{j,t) 

e"^(j,t) tL(k;j,t)-f(j,t)]>e‘^(i,T) [L(k;i,T)-f(i,T), 

(i,T)j<(j,t)} (S-8a) 

S(0,-) - U(k)s L(k;i,T)<f(i,T) , i-l,...,M, T-0,...,W-l} (5-8b) 

where 

L(kji,T) - I g'(8)v"^r{k-T+s) (5-9) 

s-0 

and f(i,T) has absorbed the constemt term in (5-7) . The decision statis- 
tics L(k) can be viewed as the state of a linear time invariant system 
driven by the residual. To see this, we will define the following 
notations : 

L,(k) - (L(k;l,T) ,..! ,UkfM,T) ] • T-0,...,W-1 (5-lOa) 

T 

Uk) - (LQ(k),...,L^_^(k)]' 




(5-lOb) 
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(T)V 


-1 1 




T-0, . . . ,W-1 


g;(T)v 



(5-lla) 





0 

0 



0 10 


(5-llb) 


(5-12) 


Then, from the definition of /.(k), we have 

L(k+1) » JL(k) + Gr(kHl) k=0,l,... (5-13a) 

UO) - 0 (5-13b) 

Note that L(k) is of dimension MW, and it is also a Markov process 
under any failure hypothesis. 

With the sequential decision regions defined, we are now ready to 
examine their associated risk. First, it is convenient to define S^(k) 
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to be the event that no failure declaration has been 0 &de up to and 
Including tine k, i.e. 

SjjCk) - a(k)es{0,-), L(k-I)e S(0,-) ,...,uw)e (5-i4) 

Since sasqpling through the first H time steps is mandatory, it is not 
necessary for (5-13) to include S(k)e S(0,-) for k<W. Using the sequential 
rule defined by (5-8) in the risk expression (4-26) , we get 

M » T-1 M W-1 

u.(f) - L_ I I M(i,T) III P^U(k)e S(j,t) . S.(k-l) |0,-} 
i-1 T-W+1 k-W j-1 €-0 ^ ^ 

M « "0 M W-1 

+ I I P(t,T) I II Ic(i) (k-T) + L(i,j,(k-t-T)J 
i«l T*1 k>* j»l t*0 

maxtw,T] 

X Pr{L(k)e S(j,t) , SQ(k-l) ] i.x} (5-15) 

w 

where we have used y (f) to denote the sequential risk due to a set of 

s 

sequential decision regions with window size W and parameterized by f. 

Note that the mandatory continuation of the sanqiling process through the 

first H steps is reflected in the lower summation limits for T and k. 

W 

To evaluate u (f ) , we need to determine the set of probabilities, 

8 

{Pr{i(k)€ S( j,t) , SQ(k) I i , t}, k>W, j«0,l, ,M, t«0, . . . ,W-l}, which, indeed, 

is the goal of many research efforts in the so-called level-crossing problem 
[341. Unfortunately, useful results (bounds and approximations of such 
probabilities) are only available for the scalar case [35] , (36) , (37) , i.e. 
in terms of our problem, i(k) is a scalar and the decision regions become 
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intervals on the real llne« and thus are not applicable to our problm. 

For the general multidinensional problem, we presently have to resort to 
numerical methods. As it stands, each of the probabilities is an integral 
of a kMM-dimensimtal Gaussian density over the expound region S(0,-)x... 
x8(0,-)xS(j,t) , which, for large kMM, becomes extremely unwieldy and dif- 
ficult to evaluate. However, the common structure of the probability 
events — the S^Ck-l) of the event {l(k)e S(j,t), S^Oc-l)}, may be exploited 
to obtain a more tractable compu'-ation structure for the probabilities. Tta 
accomplish this, we can use Bayes rule to arrive at an recursive expression 
for the probabilities: 


P(L(k+l) ISqOc) ,i,T) •[/ P(L{k) ISpCk-l) ,i,T)dUk)j"^ 

X f Pa(k+1) lUk) ,S^(k-l) ,i,T)P(Uk) |S (k-1) ,i,T)dL(k) 


S(0,-) 


k>W 


(5-16) 


PrU(k)€ S(j,t) ,SQ(k-l) |i,T) 


- Pr{SQ(k-l) |i,T} y p(Uk) IS^Ck-l) ,i,T)dUk) , 


S(j,t) 


j-0,1,. . . ,M, t-0,. .. ,W-1 


(5-17) 


with 


Pr{t(w)e s(j,t) |i,T} - J p(i(w) |i,T)dL(w) , 

s(j,t) 

j*0,l,...,M, t"0,...,W-l 


(5-18) 


where p(L(k+l) jSQ(k) ,i,T) denotes the conditional probability density of 
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L(k^l ) , giv«n S^Oc) and that a type i failuta hat occurrad at tijaa T» 

pUClcfl) |t(k) , S^(k-l) ,i >T) is the density for the transition fr«n i(k) 
to Lik+1) , given S^(k-l) and (i,T); p(l(W)|i,T) is the (Gaussian) density 
for L(W) under a hypothesis (i«T). Using (5-16) -(S-18) « ve have to contend 
with MM-dinensional integrals instead of integrals with increasing dinensions 
as required by the straightforward evaluation of the probabilities. 
Nevertheless, this problem is still difficult, since even for M«2 and ff>10 
the integrals are 20-dimensional. 

In the remainder of this section we will examine the c«i^tational 
COT^lexity associated with tt>cs evaluation of (5-16) -(5-18) . First, we will 
consider the transition density in (5-16). Noting that L(k) is a Markov 
process we have 

p(i(k+l) iKk) ,SQ(k-l) ,i,T) - pa(k+l) lUk) ,i,T) (5-19) 

From (5-19) , we have 

t(k+l) - JL(k) - Gr(k+1) (5-20) 

The dimension of L(k) , MW, is generally greater the rank of G, which is 

assumed to be m>M (m is the dimension of r) . The increment l(k4-l) - JL(k) is 

due to r(k-«-l) and can only lie in, at best, m-dimensional subspace of 
MW 

R . That is, the one-step transition density in (5-19) is degenerate. 

Then, using the fact that r(k-t-l) and L(k) are independent of each other, 
it can be easily shown that 

p(L(k+l) lUk) ,i,T) 

- UQ(l|(I-G(G’G)"^Gnt(k+l)-JUk)Jll) Ig'gI"^ 

X p(r(k+l)-(G‘G)'^G’ lUk+l)-JirV) ] |i,T) (5-21) 
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where is the is^lse function, |c'g| is the determinant of G'G, and 
p(r(k-fl) |i,T) denotes the Gaussian density of r(k-i-l) under (i,t). As 
expected, the transition density (5-21) for a given L(k) is zero for some 
L{k*i\ , namely those values such that (Kk-fl) -JL(k) } cannot be accounted for 
with any r(k+l) , i.e. when {I-G(G'G) ^G* ) fUk+1) -JL(k) Jf<0. 

Finally, the recursive equation (S-16) for the conditional density of 
L can be re-written ast 

p{L(ktl) 5Q(k),i,T) - 

X y* |G'Gr^p(r(k+l)-(G’G)'^G' lL(k+l)-JUk) ] |i,T)p(L(k) jS^Ck-l) ,i,T)dt(k) 
S(0,-;O F(L(k+D) k>w (5-22) 

where 

F(t(k+1)) - {L(k) :[I-G(G'G)"^G') [Uk+l)-JL(k) ] - O) (5-23) 

The set r(L(k-t-l)) is the set of all L(k) 's that together with some 
r(k+l) will produce the given L(k+1) . If rank Gq<M (see (5-11)), the 
pair (Jk«) representing the system (5-13) is uncontrollable, and there are 
some values oi L that cannot be generated by (5-13). This corresponds to 
the case, for example, when two failure hypotheses represent different mag- 
nitudes of the same failure node. The set L's satisfying the above condition 
can be determined from (J,C) , and the probability density for such l's will 

not have to be calculated, as it will always be zero, ftore generally, 

MM 

F(l(k+1)) Is a linear variety in P and becomes the empty set only for 


u 


S(0,-) 


p{l(k) ISc)<k-l) ,i,T) 


dl(k)l 
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certain values of L(k+1). Hence the region of integration needs to be 
determined as a function of / (k+1) . Although the scalar version of (5-22) , 
where regions are intervals on the real line and are independent of L(k+1) , 
has been successfully evaluated using numerical quadrature methods [38], 
there is presently no efficient method available for solving the general 
mlti-dimensional problem. The difficulties are due to the large dimension 
of L and the complex regions of integration, S(0,-) O F(l(k+1) ) , as indicated 
above. However, when the rank of G is the same as the dimension of L, the 
rer .c n of integration will simplify to S(0,-). TSiis is because when rank 
tranjltic.-is from and L(k) to any L(k+1) are possible and F(t(k+1))«R****. 

An algorithm has been developed to perform the integration for this special 
case in low dimensions, and it will be described in Section 5.2. The condition 
that rank G = MH is not as restrictive as it appears, and we will see that 
the algorithm based on this assumption is useful in determining the risks 
associated with some sin^lified decisions rules to be discussed in the next 
two s\ibsections. 

In this scibsection we have examined the problem of designing sliding 
window sequential decision rules. Several simplifications directed towards 
practical solutions have been discussed. In addition, we have identified 
the structure of the computations required for evaluating the risk and per- 
formcince associated with a sliding window rule. Based on these insights we 
will propose two simple decision rules (a simplified sliding window rule 
and a non-window rule) in Subsections 5.1.3 and 5.1.4 for which the risks 


CcA be evaluated or approximated by existing numerical techniques. 
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5.1.3 A Simplified Sliding Window Decision Rule 

Ihe nulti-dimensional integrals (5- 16) --(5- 18) encountered in the 
calculation of the ris)c are due to the MW-dimensional vector of decision 
statistics L(k). These statistics correspond to the MW failure hypotheses • 
and they provide the information necessary for the simultaneous identifica- 
tion of both failure type and failure time. In most applications « such as 
the aircraft sensor FDI problem [ ix 1 emd the detection of freeway traffic 
incidents [5 1 » the failure time need not be explicitly identified. In 
such cases, the terminal decision set reduces to P =0= j*l,...,M}, 
where the index j denotes the declaration of a type j failure. Since the 
decision does not directly concern the onset of failures, the failure tia« 
resolution power provided by the full window of decision statistics is not 
needed. Instead, decision rules that employ a few components of L(k) may 
be used. The decision rule of this type considered here consists of se- 
quential decision regions that are similar to (5-8) but are only defined in 
terms of M components of L(k) : 

S . = { V,(k) : 

Uk; j,W-l)>f , 

J 

e"^(j,W-l) (L(k}j,W-l)-f J>e'^(i,w-l) [L(k,i,W-l)-f .1, Vi,<j (5-2<a) 
Sq = {L„_l(k) : L(k,j,W-l)< fj j=l,...,M} (5-24b) 
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«^ere is the stop-to-declare-j- region and is the continue region. 

At each point "n tine, the set of decision regions specified by (5-24) 
nay be regarded as a decision rule for determining if a failure has 
occurred at the earliest point in the sliding window (i.e. at k-W+1) , and 
it is similar to a W-sample decision rule for testing M+1 hypotheses, “niis 
vantage point readily provides a rough guideline for choosing the window 
size W, n 2 unely, W should be sufficiently long so that enough residual sam- 
ples cam be used to achieve acceptable detection accuracy in the non- 
sequential testing of M+1 hypotheses (M hypotheses indicating the possibility 
of rue of M different failures occurring at k-W+1 plus the hypothesis of no 
failure at k-W+1) . Because the actual decision problem is a sequential 
one, rather than a static one this guideline will only serve to provide am 
initial choice of W that may be later adjusted to achieve a better perfor- 
mamce. (We will discuss the choice of W along with other design paurameters 
of the Bayesian approach in Subsection 5.3.1). It should be noted that the 
use of (5-24) is effective if cross-correlations of signatures among hypo- 
theses of the same failure type at different times are smaller tham those 
among hypotheses of different failure types. 

Next, we will examine the risk associated with the sequential rule 
(5-24) . The following equations for the risk con 5 >utation are specializations 
of those of the previous section to the simplified sliding window rule. 

We have 

M ® T-1 M 

s I I y(i,T) I I p {L (k)es , S (k-i)lo,-} 

® i=l T=W+1 k*W j=l ^ J 

M 00 00 M 

+ I I V(i,T) I I [c(i) (k-T)+L(i,j)] 

i=l T=1 k= j=*l 

max[W,T] 

X Pr{l^_j^(k)es^, SQ(k-l)|i,T> 


(5-25) 
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where 


^0^^’ “ ®0' 


>L, 


W-l 


(w)e SqJ 


(5-26) 


The probabilities required for calculating the risk (5-25) are given by 


p(i^-l(k+l) |SQ(k) ,i,T) ‘ 


“ c/ p(i„_i(k) |SQ(k-l).i,T)dL^_^(k)]"^ 

(k) ,SQ(k-l) ,i,T)pa^_^(k) ISQ(k-l) ,i,T)di^^_j^(k) 

k>W 


(5-27) 


Pr{L, - (k)€S., S„(k-1) |i,T} 

W-l j 0 ' 

= PrfSgCk-l) ii,T>y* p(i^_^(k) ISQ(k-l) ,i,T)di.^_^(k) j=0,l,...,M (5-28) 


with 


Pr{L^_l(W)eSjii,x} = J P(i-(,_i(W) |i,T)dL^^_^(W) (5-29) 

S . 

D 


In contrast to the MW-dimensional integrals associated with the sliding 
window rule discussed in the previous subsection, the integrals in (5-27)- 
(5-29) are M-diinensional. For M small, say less than 4, numerical inte- 
gration of (5-27) -(5-29) becomes manageable. 

Unfortunately, the transition density, p(L ,(k+l)jl . (k) ,S (k-1) ,i,T) , 

W“ X ff u 

required in (5-27) is difficult to calculate, because L , (k) is not a 

W-l 

« 

Markov process. As an alternative, the Markov nature of L(k) c^m be exploited 
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once again to determine the required conditional density and probabilities 
as follows: 


p(L{k+l) ISqCK) ,i,T) - ISqOc- 1) ,i,x)dL(k)]"^ 


: J|G'Gr^p(r(k+l) = (G’G)“^G' [L(k+l)-JL(k) } |i,T)p(Uk) (S^Ck-l) ,i,T)dL(k) 


S^nFUCk+D) 


(5-30) 


Pr{I^_^(k)eSj, SQ(k-l)|i,T} 


= Pr{S^ 


(k-1) |i,x} 



s. 

D 


p(L(k) IS^Ck-l) ,i,x)dUk) 


(5-31) 


~ Iff 

where is the extension of in the L(k) -space, i.e. R , i.e. 

S. = {L(k) :L .(k)es.}, j=0,l,.,.,M. The obvious drawback in using (5-30) 

D w-i ] 

emd (5-31) is that we have to deal with MN-dimensional integrals again. 
■Hierefore, in order to exploit the low dimensionality of (5-27) and (5-28) , 
we will have to use an approximation for the transition density in (5-27) . 
In the remainder of this subsection we will describe a simple approximation 
of p(L^_j^(k+l) |L„_ 3 ^(k) , SQ(k-l) ,i,x) . 

It is useful to note that in approximating the required transition 
density for we are, in fact, approximating the behavior of 

A simple approximation is a Gauss-Markov process £(k) that is defined by 


)l(k+l) = A]l(k) + C(k+1) (5-32) 

E{Uk)5' (t) } - BB'UQ(k-t) 


(5-33) 
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where A and B are MxM constant matrices and C is a white Gaussian sequence 
with covariance equal to the (MxM) matrix BB*. The reason for choosing 
the model (5-32) and (5-33) is twofold. Firstly, just as 
is Gaussiw. Secondly, iUlc) is Markov so that its tr 2 Uisition density can 
be readily determined. In order to have S,(k) behave like L^_j^(k), we set 
the matrices A and B and the mean of A such that 


JUk)} = E. JL^ ,(k) } 
1,T 1,T W-1 

(5-34) 

;Q^_{Jl(k)£-(k)> = EQ^_{L^_^(k)L^_^(k)} 

(5-35) 

;Q^_{Jl(k)il’(k+i)} = EQ^_{L^_^(k)Lj^_^(k+i)} 

(5-36) 


That is, we have matched the marginal density and the one-step cross- 

covariance of jl(k) to those of L, , (k) . It can be shown that (5-34)- 

W-1 

(5-36) uniquely specify 

A = e' e"^ <5-37) 

1 0 


BB' = E - e’ Z (5-38) 

0 10 1 


E. ^(C(k+1) = E. ,(k+l)} - A e{L„ ,(k)} (5-39) 

IfT W*“X W*"x 
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where 


W-1 

to- 

W-2 

ti- Vi 


(5-40) 


(5-41) 




0 

k-T 


^ ^t-k 

t=0 0 ^ 


w-1 

I =t'' 

t=o "’'o 


T>k 

kQ=k-W+l-T<0 

k =k-W+l-T>0 
u 


Moreover, the matrix A is stable, i.e. the magnitudes of all of the 
eigenvalues of A are less than unity, and B is invertible if or 

is of rank M. Because 5 is an artificial process (i.e. 5 is not a 'direct 

function of the residuals r) , £(k) can never be implemented for use in 
(5-24) . 

It should be noted that the model specified by (5-34) -(5-36) does 
not provide the only Markov approximation of j^(k). We may, indeed, 
choose to match the n-step cross-covariance (l<n<W) instead of matching 
the one-step cross-covariance as in (5-36) , or we may just approximate 
the cross-covariance function. Such a variety of possible models is the 
result of the relatively small number of free parameters available in the 
Markov model to be adjusted in order to describe the more complex j^(k) 
process. The suitediility of a criterion for choosing the matrices A and B, 
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such as (5-35) and (5-36) , depends directly on the failure signatures under 
consideration and may be examined as em issue separate from the decision 
rule design problem. Since our main goal is to demonstrate how the Bayes 
approach can be used in designing sequential decision rules, we will not 
pursue this issue further. Rather, we will proceed with the design problem 
assuming an appropriate Marltov approximation (5-32) of L^_j^()t) is available. 

Now we can approximate the required probabilities in the risk 
calculation as 


P U„ -(k)es., S„(k-1) |i,T}«P {)l(k)es., S-(k-l) i,x} 

3T W““X 3 V IT D ^ 


j=0,l,...,M k^W 


(5-43) 


eUld 


Pr{il(k)eSj, SQ(k-l) li,x} 


= Pr{SQ(k-l) 



p(il(k) ISQ(k-l) ,i,x)djl(k) 


(5-44) 


where we have applied the same decision rule to £(k) as L^_j^(k). Therefore, 

S, and S„(k-1) denote the decision regions and the event of continued 
3 0 

sao^ling up to time k for both L and Z. Assuming B exists, we have 

W** X 


p(£(k+l) |SQ(k) ,i,x) 


[/ 


p()l(k) ISQ(k-l) ,i,x)dJUk) 





p(^(k+l) = [£(k+l)-Ail(k) ] |i,x)p()l(k) ISQ(k-l) ,i,x)d£(k) 

k>W 


(5-45) 


where p(?(k) |i,X) is the Gaussian density of ^(^^5 under the failure (i,X) . 
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If £(k) satisfies (5-34) and (5-35) 

Pr{H(W)ra.|i,T} - Pr{i (W)es.li,T} (5-46) 

j 3 

In contrast to (5-30) and (5-31), t)ie evaluation of (5-44) and (5-45) 
requires only M-dimensional integrations over the decision regions. The 
complication in the region of integration due to F(i(k+1)) (in 5-30) is 
absent. An algorithm exploiting existing numerical techniques for computing 
(5-44) -(5-45) has been developed and it will be discussed in Section 5.2. 

In the event that B is not invertible, the region of integration in (5-45) 
will take the form similar to that of (5-22) in lower dimension (M instead 
of MM). Such an integral is very difficult to evaluate, and it represents 
an area for future research. Very often this problem can be circumvented 
by batch processing the residuals. That is, we may consider the modified 
residual sequence: r(R) = [r* (vK-v+1) ,r' (vR-v+2) , — ,r'(vK)]' for some 

batch size v>0 with Jt=l,2,... as the new time index. In using f(K) 
we have to augment the signatures as: (g^(0) , . . ,gj^(v-l) ] ' , i»l,..,M. 

By a proper choice of v, the rank of Gq can be increased to M and B will be in- 
vertible. An example of the batch process is included in the next s\ibsection. 
Therefore, we direct our attention to cases where B ^ exists. Under this 
condition, the algorithm in Section 5,2 can be used to obtain approximations 
of the sequential risk and the detection perfomuurice (i.e. the expected 
decision delays, probability of false alarm, etc.) of the simplified sliding 
window decision rule. Simulation aimed at assessing the accuracy of the 
probability approximations (5-43) resulting from the use of the Markov 
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model iUk) described by (5-32) -(^j-42) are reported in Chapter 6 together 
with the actual design of decision regions of the form (5-24) for a two- 
failure-mode problem. 

In concluding this subsection, we note that increased accuracy may 
t>e obtained by using a higher order approximation, l.e. when £.(k) is given 
by 


«,(k) » C H(k) (5-47) 

I(k+1) -Al(k) + B f(k+l) (5-48) 

where A and B are nxn and C is Mxn with n>M. 'ttie increase in accuracy 
is achieved at the expense of Increased computational con^lexity, since we 
have to contend with n-dimensional probability integrals over regions of 
the form in (5-30). When n*MM, (5-47) and (5-48) will provide an exact 
description of we are, once again, confronted with (5-30) and 

(5-31) . Due to the lack of eui efficient algorithm for calculating the 
required integrals, this subject of higher order approximation is not 
purtmed any further in this thesis. 

5.1.4 Non-Window Sequential Decision Rules 

In the previous subsection we have discussed the simplified sliding 
window rule in which the M decision statistics are formed from a window 
of resldxial samples. Here we will describe another simple decision rule 
that has the same decision regions as the simplified sliding window rule 
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(5-24) « but tha vector (z) of M decision statistics is 
differently as follows: 

s(k+l) - A Z(k) -t- B r(k+l) (5-49) 

where A is a constant stable MxM matrix, and B is a Nxn constant matrix 
of rank M. Unlike the Markov model )l(k) that approximates L^_^(k), 
z(k) is a realizable Markov process driven by the residual, ^e obvious 
advantages of using s as the decision statistic are: 1) less storage is 

required, because residual samples need not be stored as necessary in the 
sliding window scheme, and 2) since s is Markov, the required probability 
integrals are of the form (5-44) end (5-45) , autd the algorithm to be 
described in Section 5.2 can be directly applied to evaluate such integrals. 

In order to form the statistics s , we need to choose the matrices A 

«w 

and B. When the failure signatures under consideration are constant blAsesi 
B can sinply be set to equal G^, and A can be chosen to be ai, where 
0<o<l. Then, the term Br in (5-49) resembles Gr of (5-19) , and it provides 
the correlation of the residual with the signatures. The time constant 
(■£~) of c characterizes the memory span of Z just as w characterizes that of 
the sliding window rules. When a is close to one, residtial saisples from 
long ago are remembered, and when a is zero, Z(k) is just Br(k) . ITierefore, 
a (or A in general) can be regarded as a design parameter playing a role 
similar to that of W in the sliding window scheme. 

More generally, if we consider failure signatures that are not constant 
biases. Then the choice of A may still be handled in the same way as in the 
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a* in the conatant-biae caae, but the selection of a B Mtrijc is aors involved. 
Qualitatively, the role of B (just as G in (5-19)) is to bring out the failure 
signature contained in the residual. Therefore the rows of B should re^esent 
SOM characteristic directions of the failure signatures in question, e.g. 
in the constant-bias case, the rows of B are sinply the signatures of the 
failures. As the signatures are not constants the ctwlce of s\ich characteris- 
tic directions is not straightforward and is very much problem dependent. 

With some insights into the nature of the signatures, a reasonable choice of B 
can often be made. To illustrate how this may be accOT^lished, we will con- 
sider an exan^le with two failure modes and an m-dimensional residual vector. 
Let 

g^()t-T) - 0^ (5-50) 

g^Ot-T) • BjOc-t+I) (5-51) 


That is, g^ is a constant bias, and g^ is a ramp. If 0^ and 0^ are not 


multiples of each other a simple choice of B is avallables 


B • 


0 . 


(5-52) 


If And scalar constants, the above 

choice of B has rank one and is not very useful for identifying either 
signature. Suppose we batch process every two residual sai^les together, 
i.e we use the residual sequences r(K) - (r* (2)i-l) ,r* (2R) , K«l,2,... 
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Ihm w« can sat i to ba 


B 


6 - 3 * 

$• 28 ' 


(5-53) 


Thus, the first and second rows of B capture the constant-bias and raqp 
nature of and g^« respectively (and this i has rank t%«o) . Ihe use of 
the siodified residual r(R) in this cas^ eau.tes no adverse effect, since it 
only lengthens slightly the interval between tines when teminal declsimxs 
nay be aade. A big increase in such intervals i.e., the batch processing 
of r(k},...,r(k-fv) simultaneously for large v, nay however, be undesirable. 

The above simple example serves to show that the applicability of the 
decision statistic z is not as restrictive as it first appears to be. In 
any event, the matrices A and B may be regarded as design parameters just 
as the TOst functions and prior probability mass function. The merit of 
any choice of A and B may be assessed by determining if the decision rule 
based on such choices yields good performance. The algorithm of Section 5.2 
will aid in evaluating the risk associated with using x in the decision rtile, 
and the risk-minimisation algorithm to be discussed in Section 5.3 can be 
used in obtaining a decision rule that has as small a risk as is possible for 
a particular choice of A and B. The design of a decision rule using s as the 
decision statistic for a two failure-mode problem is reported in Chapter 6. 
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Nhils th« ctAtlstie B 1b pot«ntlBlly utaful for b wide rang* of 
pr^lBBUi^ BxpBCt its effsctivsnsss to diainish for problsns tidisrs ths 
slgnaturss vary drastically as a function of the elapsed tiise, or the dis- 
tii^uishability among failures depends eventually on these variations. This 
is due to the fact that a constant B is not adequate to capture the essence 
of rapidiy varying signatures. In such cases the sliding window decision 
rule should provide better performance because of its inherent nature to look 
for a full window's worth of signature. However, this still leaves cany 
applications for which s is a useful statistics. 

It is possible to use a higher order s (similar to T of the last 
subsection). It is not considered here, because in fact, it is mimicking 
the sliding window statistic In addition, the increased order cosspllcates 

both implementation and the CMig^utation of the required probability integrals, 
now having the form (5-30) and (5-31) . Such added complexity will negate 
the advantages of using z. 

5.2 gvaluation of the Risk awl Performance Probabilities 

In this sectim we will examine the problm of computing the risk 
associated with the decision rules discussed in the last tt«o subsections. 

An algorithm based on standard numerical quadrative techniques has been de- 
veloped for calculating the conditional density (5-45) and probabilities 
(5-44) recursively, since the decision statistic * of subsection 5.1.4 and 
the approximation (£) of the sliding window statistic of subsection 


5.1.3 are both Markov processes with the required calculations in the form 
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of (5-44) and (5-45), this algorithm is directly applicable to both cases. 

We will only describe the algorithm for the two failure-node problem, 
although it may be easily generalized to an arbitrary nunber of fail\ire 
modes. It will becme clear, however, that due to the exponential increase 
in computational requirement as a function of the number of failure nodes 
the algorithm is only practical for decision rules dealing with a few failure 
modes. This problem is intrinsic to the numerical evaltiation of multi- 
dimensional integrals and, in general, cannot be avoided. The approach to 
the design of a robust residual generation process undertaken in Chapters 2 
and 3 will aid in limiting the number of failure modes to be considered 
simultaneously by a decision rule. Since each residual generation process is 
based on a part of the system, namely the most relevant and parameter-insen- 
sitive part, it will include only a stobset of all the possible failure types 
of the whole system. Then, it is likely that a decision rule employing the 
residual from such a process will have to deal with only a small number of 
failure modes. 

A brief review of the quadrature technique en^loyed In this study is 
inclijded in subsection 5.2.1, while the actual algorithm for calculating 
the conditional density and the required pjrobabilities is described in 
sxibsection 5.2.2. In subsection 5.2.3, we will discuss the risk evaluation 
problem. 

5.2,1 Gaussiaui Quadrature Formulas 

Nvmierical integration generally involves the approximation of a definite 
integral by a finite sum. The most widely studied method is of the form 
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f " 

I w(x)F(x)dx « ^ 

s=l 


V®F (x®) 


(5-54) 


where s is an index for the points used in the formula, x is a scalar 
variable, 2 uid w(x) is a function for which the integrals 


& 


/ 


}c s s 

w(x)x dx, k=0,l,2,... are defined and finite; v and x are known as 


the weights and nodes, respectively. When w(x) is nonnegative in [a, 8], a 
set of weights and nodes can be found so that the approximation (5-54) 
becomes exact for F when it is a polynomial of degree less them 2n. Such an 
approximation is known as a n-point Gaussian quadrature formula, or Gaussian 
formula [39]. Based on the theory of orthogonal polynomials, efficient 
Gaussian formulas have been determined for the 1-dimensional integr 2 U. for a 
variety of w and intervals [a, 8] • Attempts to develop simileu: formulas for 
several dimensions has met with little success. The most connon approach to 
M-dimensional integration is to regard the integral as a H-fold iterated 
integral and apply a 1-dimensional formula to each V 2 uried>.le separately. The 
resulting formula is called a product formula, e.g. in two dimensions. 


h -1 


/ f F(x^,X2)dx^dx, 


“2 “l 


^2 h 

I / 

*^2 “1 


n2 


w (x,)w (X > [w"^(x^)w ^(x ]F(x^,X 2 ) IdXj^dx^ 


t*l s=l 


(5-55) 
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8 8 

where we have assumed w^ (x) exists for Y*l,2, and v^> x^ are the weights 
and nodes of the n -point Gaussian formulas: 


f 

J ^ 


n 


(x)F(x)dx = \ V F(x®), ir-1.2. 


a. 


S“1 


Y Y 


(5*^56) 


This approach is the basis of chut algorithm for evaluating the integrals of 
(5-44) and (5-45) . The two 1-dimensional Gaussian quadrature formulas 
employed in the algorithm are 


09 

J' e”*F(x) a V® F(x^ 


s=l 


) 


(5-57) 


00 

/ 


_ 2 n 

e“* F(x)dx = I V® F(x®) 
. n n 
S“1 


(5-58) 


Now, V and x are the weights and nodes of the n-point laguerre-Gauss 
L 

•«s s 

formula (5-57) , and v^^ and x^^ are the weights 2 md nodes of the n-point 

Hermite-Gauss fosnmila. The weights ^md nodes for both of these formulas are 

s s 

tabulated for a wide range of n[ 40]. (xn fact, the nodes x^ and x^ are the 
roots of the n-th order Laguerre and Hermite polynomials, respectively.) 
Provided the integrals exist and are finite, we have the following formulas: 


w 

I 


F(x)dx « \ V® F(x_) 

- Ii Ij 

&sl 


(5-59) 


n 

f F(x)dx = I V® F(Xy) 
J S“1 


(5-60) 
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whexe 


..B 



'H 


B 2 

V e 
H 


( 5 - 61 ) 


( 5 - 62 ) 


For SOTO finite regions of integration, such as a s£diere, a cube, 
etc., estinate of the error associated with the product formula (5-55) are 
available [ 39 ] . These results are not useful for our problem, because 
they are dependent on the (higher order) derivative of the integr 2 ind function. 
The integrands of (5-44) and (5-45) involve the conditional density for 
which derivative information is very difficult to obtain, nie fact that we 
are dealing with probability integrals, however, will provide us with some 
handle on the error magnitude. We will discuss this when we describe the 
algorithm in the next subsection. In closing, we iK>te that further references 
on numerical integration may be found in the survey paper by Haber [ 41 ] . 

5.2.2 ftn Algorithm for Calculating the Conditional Density and 

Associated Probabilities 

The computational procedure described here will be applicable to both 
l(k) of subsection 5.1.3 ^md z(k) of subsection 5.1.4, since by setting 
Br()c+1) to be C(k+D we can see that both (5-32) and (5-49) have the same 
form. To facilitate discussion., we will use the simplifying notations: 

hj^(f(k)) = p(i(k) ISqOi-I),!,!) k>W (5-63) 

h (il(W)) - p()l(W) |0,-) (5-64) 

w 
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Pj^()lOt+l)-AlU)c)) - p($(lc+l) - jUk+l)-Al(k)) k^ ( 5 - 65 ) 


Pjj(j) - Pr{JUk)eSjlSQ(k-l),i,T)} k>W ( 5 - 66 ) 

^ere p(it(H)) is taken to be t))e steady no-fail density of I, i.e. we 
assume that we begin in the steauiy state at W. Note that the dependence on 
(i,T) is suppressed. It is understood that the above quantities have to be 
interpreted in context with sene (i,T) pair. 

For the decision regions have the form (see Figure 5 - 1 ): 


% - <»= ‘i 1 fi- <-2 i *2> 


( 5 - 67 a) 


Sj - U= e'"(VV> 


( 5 - 67 b) 




( 5 - 67 C) 


Then the propagation of the conditional density is governed by 


^ ^2 


hj^^j^(J,(k+D) » p”^( 0 ) I I Pj^ ( SL (k+ 1 ) -AS, (k) ) hj^ ( Jl (k ) ) di^ (k) di^ (k) 


..00 .00 


( 5 - 68 ) 


Substituting S,(k) ® f-y, we get 




// 

Q 0 


Pj^ (S (k+ 1 ) -A [f-y] )hj^ (f-y)dy j^dy^ ( 5 - 69 ) 
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/ 


Similarly, we can write 

’'k“” i \<t-yMy,ay. 




P^U) 


■ J J *^4 

-00 "'o ^ 

■/ /“■< 

y_oo ✓o 


•2«i(k))-y2j 




where 


a, ()l,(k)) = 

X 2 


^2(^)1 f 2 






a^dj^Ck)) = 




(5-70) 


(5-71) 


(5-72) 


(5-73) 


(5-74) 


The integrals (5-69)- (5-72) are in the forms that are suitable for the 
application of the product formula employing Laguerre and Hermite formulas 
Using for the integral from 0 to “ (5-59) and (5-60) for the integral from 


_qp to ®, 


the above integrals can be approximated as 
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-”l,«v r r _„s t 


I , I / tl)\ (1/ tl) 

t-1 s-1 Lf^ - Xj^J Lf 2 - x^y 

(5-75) 


\ \ 




^ - *L ' 


*2 - “l J 


(5-76) 


"h ”l 


P^(l) 




, t. s 
a^(Xn) - x^ 


(5-77) 


"h ”l 


P^(2> 


n -u ^ / 

■ .1 i ( 


r t 


(5-78) 


where an n. -point Laguerre formula and an n -point Hermite formula are used. 
Li H 

Ihe above approximations may be set to be equalities while keeping in mind 
that the qucuitities on the left houid sides becxane approximations of the true 
ones. Thus, (5-75) and (5-76) describe the propagation of the approximate 
conditional density. The probabilities of (5-44) can be approximated by 
using (5-76)- (5-78) in 

k-1 

Pr{JUk)eS^, Sptk-l) li,x} - Pj^(j) II Pg(0) (5-79) 

Due to the fact that only approximate values of Pj^(3)» j“0,l,2 are used in 
(5-79), the errors may accumate as k increases. Some feel for this 
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cxmalative error is obtained by means of comparisons with simulation results, 
and it is reported in Chapter 6. 

Since the probabilities in (5-76) - (5-78) should sum to one, l.e. 


^ P^(j) should equal one, hj^(Jl()c)) and j*0,l,2 are normalised 

j-0 

for all )c>W. This insures that Pj^(j) valid probedsllities and hj^(i(k)) 
will always be close to being a density function. The un-normalised sum. 


2 

I serve as a coaorse Indicator of the accuracy of the approxi- 

j»0 

mations. That is, if it is not close to one, we know that the approxiisatlons 
are poor and more points will have to be used in the quadrattire. Altitough 
the fact that the sum is close to one does not necessairily In^ly the quadrative 
is indeed accurate, we would be more confident in the approximations if this 
is in fact the case. 

Upon examining (5-75)- (5-78) , we note that for every k, there are only 

a fixed number of points in the f-plane for which h^ will have to determined, 

2 

neunely, n^ xx>ints in and point in S, and each. Moreover, these 

points do not vary with k. In order to calculate hL . for aay point I, all 


2 

n_ points in S will have to be used. An estinute of the cocq^tational 
L 0 

requirement can be obtained as follows. For simplicity, let us assixos 

n,«n «n. Suppose we call the evaluation of each term in the sunanation in 
L H 

2 

(5-75) a step. Then we need to perform n steps to obtain a new points. 

2 4 

Since there are a total of 3n points (due to three decision regions) , 3n 

steps wi'*' ,ave to be caurried out at each iteration. Table 5-1 shows the 
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n 

3n^ 

(steps/iter) 

8 

12288 

10 

30000 

12 

62208 

14 

115248 

16 

196608 

18 

314928 

20 

480000 

22 

702768 

24 

995328 

26 

1370928 

28 

1843968 

30 

2430000 


Assuming 


5x10 


sec/step 


TABLE 5-1 : Computation Requirements 


time/iter 

(sec) 

.61 

1.50 

3.11 

5.76 

9.83 

15.74 

24.00 

35.14 

49.77 

68.55 

92.20 

121.50 
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total mother of steps and the corresponding confutation time for iteration 

as a function of n. (For this table we have assumed a time of SalO^VvAo/stepf 

while using an un-optiinized code on an IBM 370/168 computer we have experienced 
-5 

rot^hly 7x10 sec/step.) For n<«16, sav> almost 10 seconds are required per 

iteration. Typically, a minimum of 30 iterations are required to calculate 

the risk approximately (see Section 5.2.3 for a discussion of risk evaluation). 

This gives 5 minutes per risk evaluation. In seeurching for a set of risk- 

minimizing decision regions (see Section 5.3.1), quite a few risk evaluations 

are usually needed. Therefore, even for the single two- failure mode case, 

the ccmputational burden is not insignificant. In general, the M-aode problem 
M 

will require (M4-l)n steps/iteration. It is obvious that M does not have to 
be very large before the amount of computation becomes too m\fh to handle. 

In order to construct a Laguerre or Hermlte formula that gives good 
results we need to choose two design parameters carefully - the number of 
points to be used in the quadrative and the scale factor of the variable of 
integration. Although we will use the Laguerre formula to illustrate the 
importance of these two parameters, the following discussion will aifly to the 
Hermite case as well as to product formulas conposed of Hermite and Laguerre 
quadrature rules. Figure 5-2 shows the location of the nodes of a 8-point 
a]id a 16-point Laguerre formula. It is evident that for a larger n, the nodes 
cover a larger range as well as providing a more dense covering for msall 
values of x. Thus, a large n is suitable for integrands that are "spread-out". 
When nx>re points are used, more computation is required. The choice of an 
appropriate n is based on the tradeoff between accuracy and cce^tational 
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load. lha li^port 'ce of an ai^ropriate acaling of the Integrand can be 
illustrated by considering a siapla 1-dinensional integral f l^(n)dx. 


/ 


With a simple scale change of the variable of integration, we can obtain 
m 00 

y* F(x}dx “ J ^ F( ^)<3yr where y-ouc and a is positive. The situation of 


applying the sati« Laguerre formula to the integral with different scale fac- 
tors a is shown in Figure 5-3. The dots on the y-axis mark the nodes. A 
small a has the effect of cong>ressing the integrand while a large a tends to 
spread it out. For an excessively large a, the range of the nodes does not 
span the integrand sufficiently, (ht the other hand, for an excessively sbmII 
a, the nodes do not capture sufficient details of F. In both of these cases, 
the approximation of the integral is expected to be poor. A good <^ice of a 
can only be made with some insights into the nature of the integrand. 

For the p^^sent problem, these two parameters are c!w>sen heuristically 
according to the above guidelines. Ziiqportant considerations include the 
spread of the integrands of (5-75)- (5-78) and the thresholds (f ) . The spreads 
of the int^egrands of interest are generally difficult to calculate. Consider 
the integrand of (5-75) . It is the product of Pj^ and hj^. Therefore, its 
spread is roughly the spread of the minimum of the spreads of and h^. 

Since the covariance of is much smaller than that of h^, the scaling of the 
variable of integration is chosen to minimize the advers' -ffect (of 
Figure 5-3) on p^^. For simplicity, the same scaling is used for all integrals, 
altlwugh different ones may be applied for more accurate results. Algoriths» 
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using differsnt scaling for each of the integrands in (5-75)- (5-78) should 
be investigated in the future. 

Depending on their sizes, the thresholds may also be the source of 

error for the qpiadrative. 1o see this, let us consider the l-diawnsional 

analog of (5-75), i.e. integrating over the continue region, as pictured 

in Figure 5-4. According to the foniula (5-75) , the nodes of the Leguerre 

formula are located relative to the threshold. For the larger threshold 

f^, ^aany of the nodes span the insignificant part of the integrand, while 
2 

for f , the nodes of the Laguerre formula (with the sane number of points) 
cover the integrand well. (Note that scaling will not improve the situation). 
Ideally, n, the number of points in the quadrature formula, should be chosen 
as large as is practical so that sufficient nuadser of tiodes will cover the 
significant part of the integrand, in this study n is roughly chosen so that 
the span of the iu}des (the distance betvreen the minimtsn and isaxisnaa nodes) is 
a few times the sum of the magnitude of the maximum threshold and the scaled 
spread (the square root of the largest diagonal element of the covariance 
matrix) of h^. As the spread of h^ is an approximation of that of h^^, this 
choice of n will provide sufficient covering for the integrands of (5-76)- 
(5-78) as well as that of (5-75) . (The spread of the former is larger than 
that of the latter) . 

It is notMl that the spsn of the Hermite nodes are vary small even for 
n large (see the table of Hermite roots and weights in [40 ]). The Hermite 

H 

formula is used in coi^ting the stopping probabilities Fj^(i) *nd P]^(2). 

In (5-77) and (5-78), the Hermite nodes are centered at the axes. Under the 
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no-fail hypothesis this will provide a sufficient covering of Pj^ in and 
S^. Under a failure nypothesis, Pj^ will shift into the stopping region. 
Thus, the short speuri of the Hermite roots together with the centering at 
the axes may not cover Pj^ in and well enough. To campensate for this, 

will shift the Hermite nodes into and S 2 * This is accomplished by 
modifying (5-77) and (5-78) as 






"1^4^ - *L 


Lx;; + X, (k;i,T) j 

li J. 


(5-77) • 


"h ”l / 

r r S t I 

J, \ VAl 

t=l s=l \ 




X + X (k;i,T) 
H 2 


/ s 

^ J 


) 


where 


(5-78) ' 


Xj(k;i,T) = min[f2_j,E(Jl^_j (k) |i,T)], j=l,2 


That is, as p, shifts into S, and (under a failure) , the Hermite nodes 
are also shifted into and by means of X^ and X^. Note that X^^ and X^ 
ou:e clipped at the threshold values. This prevents the Hermite nodes to 
move too far away from the thresholds into the stopping regions in the events 
that the expected value of £.(k) vmder a failure grows indefinitely. Thus, 
we have constructed a moving grid to cover the conditional density Pj^ in 
order to obtain better accuracy in the quadrature. (Comparisons among the 
results from using (5-77) and (5-78) , (5-77) ' euid (5-78]f , and Monte Carlo 
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simulAtioniii have shown the moving -grid approach to be more accurate that the 
static method) . The technique of using a moving grid should be ftirther 
studied in future efforts in developing algorithms for computing the detec- 
tion probabilities. 

In summary, we have described an algorithm for calculating (aj^uroximately) 
the conditional density and the probabilities required for the risk evaluation; 
and we have provided an assessment of its practicality. Effective choice of 
design parameters, the scale factor of the variable of integration and the 
number of points used in the Laguerre and Hermite formulas, has been discussed. 
Ihe performemce is assessed via comparisons with Monte Carlo simulations for 
various types of signatures and thresholds, and the result is reported in 
Chapter 6. In addition to aiding the design of decision rules, the present 
algorithm provides a simple framework for exploring finer issues of conqmting 
the probabilities so that the design of more effective 2 md efficient algorithms 
can be facilitated. Finally, we note that integration algorithms based on 
other 1-dimensional formulas I 39] may be constructed for our problem, although 
they are not examined in this study. 


5.2.3 Risk and Performance Probabilities 

In this section we will discuss the computation of the risk and 
performance probcdsi lines for a sequential decision rule in the form of 
(5-67) for the detection amd identification of the various failure modes 
(bat not failure time) . Before we proceed with the calculation, we will 
examine the behavior of the conditional density p(£.(k) |SQ(k-l) ,i,x) as a 
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function of k. Recall that £ is described by 

£(k+l) - A£(k) + 5(k+l) (5i-32> 

where A is a constant stable matrix, and ^ is a white Gaussian se<iuence 
with constant covariance BB'. In the absence of a failure, Z is xero mean, 
ai^ thus, is a statioxuury process. Based or. the fact that A is constant, 

C is stationary and the decision regions (in particular, the continue region) 
are time independent, we conjecture that the conditional density under (0,-) 
will approach a limiting density function as time progresses, i,e. 

lim p()l(k) lSA-l),0-) =p(£(k)) (5-80) 

k-Ko ° 

Qualitatively, the propagation of the conditional density consists of the 
following process. After the density at k outside the continue region is 
set to zero, it is normalized to become the density at k conditioned or 
continuing. Then it is canpressed (by the effect of A with eigenvalues of 
magnitudes smaller than one) and convolved with the density of The 
convolution has the effect of spreading the density out again over into the 
continue region. Since the matrix A, the density of and the continue 
region are all time- invariant, a steady state density is most likely to be 
reached. In fact, convergence is evident in all propagations of the con- 
ditional density, by means of the algorithm described in the last subsection, 
for various values of A, BB* and f (the thresholds defining S^) . Following 
similar reasoning, we also conjecture that the actual conditional density of 


the sliding window rule, i.e. (k) (S^Ck-l) ,0-) , behaves similarly, 

unfortunately, we have not been able to prove such convergence behavior 
using elementary techniques. More advanced function-thc»retic methods may 
be necessary, but they are beyond the scope of this thesis. Assuming that 
this behavior holds, however, we will be able to obtain a simple ai^>roacimate 
expression for the sequential risk. We will discuss this next. 

Assuming no failure has occurred, the conditional density will essen- 
tially reach a steady state at some finite time T>W. ^en, for k>T we have 


Pr{l(k)es^|SQ(k-l),0-} = 


(5-81) 


Pr{Jt(k)es^,i(k-l)eSQ,...,£(T)eSQjS(T-l),i,T} »b^(k-T|i) k^^ 

(5-82) 


lhat is; once steady state is reached, only the relative time (elapsed time) 

is important. Generally, failures occur infrequently, and decision rule 

with low false alarm probabilities are employed. Thus, it is reasonable to 

T 

assume 1} p«l ((1-p) -1), i.e. the failure rate is low (see Sec. 4.2), 
and 2) Pr{S^(T) |o,-} ^ 1, i.e. nearly no false alarm before steady state is 
reached. Using (4-3) , the sequential risk (5-25) for M>2 can be approximated 
by 


U^(f) 

s 


2 * - 1 T-1 2 _ . 

I I a(i)pd-p)^"-' I I (bbJ-V{So(T)lo,-)] 

i«l T=T+1 k«-T j=l ■' 

2 a> 00 2 

+ I I o(i)p(l-p)^"^ I I Ic(i) (k-T)+L(i,j)]b (k-T)bJ~’'pr{SQ{T) lo,-) 

i=l T=T+1 k=T j=l ^ 

2 2 0 - 

loii)l I Ic(i)t+L(i,j)]b (tji)] 
i«l j-1 t«0 ^ 


(1-p) (l-b^) 
I-Eq(I-p) 




i-b^d-p) 


(5-83) 


-169- 


Next, we will seek to replace the infinite stno over t in (5-83) by the 
finite sunt up to t>A plus a term approximating the remainder of the infinite 
sum. Suppose we have been sampling for A steps since the failure occurred. 
Recall the notation (5-66) ; 

P^(j) - Pr{)l(t)eSjlSQ(t-l) ,i,0} j»0,l,2 (5-84) 

If we stop computing the probabilities after A, we may approximate 

Pt(j) = P^(j) j=0,l,2, t>A (5-85) 

and consequently 

b^(t]i) = bQ(A|i) pJ^O) P^(j) t>A (5-86) 

Under the no-fail hypothesis, (5-81) implies that (5-85) is good for A >T. 
When the signature of the failure model is a constant, the same reasoning 
behind (5-81) may be applied, and we can see that P^(j) under failure mode i 

will reach a steady state value as t (the elapsed time) increases. In this 

) 

case, (5-85) is also a valid approximation for a large A. Generally, the 
failure signatures of interest are not necessarily constant. However, for 
sufficiently large A, the probability of continuing after A tlije steps 
(since the failure occurred) may be arbitrarily small. The error introduced 
by (5-85) in the risk (and performance probability) calculation is, conse- 
quently, small. Thus, we sec that the approximation (5-85) is a reasonable 
one for a sufficiently large A. 


Substituting (5-66) in (5-83) , we get 


(5-87) 


(5-88) 
(5-89) 

(5-90) 

is the unconditional false alarm probability, i.e. the probability of 
one false alarm over all time, t^ is the conditional expected delay to 
decision, given that a type i failure has occurred, and P(i,j) is the 
conditional probeibility of declaring a type j failure, given that failure i 
has occurred. From the ass^ption that Pr{S^(T) |o,-} <^1 axtd the steady 
condition (5-81), it can be shown that the mean time between false alarms 
is simply (i“t>Q) Now all the probabilities in (5-88)- (5-90) can be com- 
puted by using the algorithm of Subsection 5.2.2. Note that the risk ex- 
pression (5-87) consists only of finite sums. In contrast to the original 
risk expression (5-25) for the simplified sliding window rule, (5-87) can be 
evaluated with a reasonable amount of ccxnputational effort. With such an 


U**(f) = PpLp + 


(1-P ) I a(i) [a(i)t + I L(i,j)P(i,j)l 

i=l L 3-1 J 


where 


(1-p) (1-bp) 

I-Fq(I-p) 


2 A 

I It b (tji) + b (A 1 i) A + 
j=l t=0 ^ 


P^(0) 


A P (j) 

P(i,j) - I b (tli) +b (Aji) 
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approxlmatlon of the sequential risk, we will be able to consider the 
problem of determining the decision regions (the thresholds f) that 
minimize the risk. Vife will discuss the risk-min-'jiiization problem in the 
next section. 

It should be noted that we are not limited to only consider the risk 
as the objective function in the decision rule design problem. For exanqile, 
we could consider choosing a set of thresholds that minimize a weighted 
combination of certain detection probabilities (P(i,j)), the expected detec- 
tion delay (t^) , and the mean time between false alarms (d-b^) ^) . 

Although such an objective function will not result in a Bayesian design in 
general, it is a valid design criterion that may be useful for some applica- 
tion. Since these non- Bayesian objective functions are also functions of 
the performance indices (e;q>ected delay, etc.), they can be evaluated using 
the approach described in this subsection and the previous one. Although we 
will not directly consider the non-Bayesian design problems, the risk-minization 
algorittm and the choice of design parameters discussed in the next subsection 
are also a^^licable for these problems. 

5.3 Design of Decision Rule - Choice of Design Parameters and 

Minimization of the Risk 

For a given set of cost functions, prior PMF, and other design parameters, 
such as the window length W, and the matrices A and B used in forming the 
decision statistic z, the design of a suboptimal rule essentially amounts to 
determining a set of decision regions (characterized by the thresholds f) 
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which minimizes the sequential risk. An algorithm, which is especially 
suitable for this minimization problem, is describe -1 in subsection 5.?.1. 

The effectiveness of the resulting decision rule depends heavily on the 
choice of the above mentioned design parameters. Por ex 2 unple, an improperly 
chosen cost function that overly penalizes false alarms will result in 
prolonged decision delays under failures. A window that is too short may 
not utilize sufficient data to achieve good detection performance regardless 
of any choice of cost functions. This aspect of the decision rule design 
problem will be discussed in subsection 5.3,2. 

5,3.1 The Sequence-of-Ouadratic-Programs (SPP) Algorithm 
for Minimizing the Risk 

The risk minimization problem has two features that deserve special 
attention. Firstly, the sequential risk is not a simple function of the 
threshold f, and the derivative with respect to f is not readily availad^le. 
Secondly, calculatino the risk is a costly task. Therefore, the minimiun- 
seeking procedure to be used must require few function (risk) evaluations, 
and it must not require derivatives. The sequence-of-quadratic-programs 
(SOP) algorithm studied by Winfield [ 42 ] has been chosen to solve this 
problem, because it does not need any derivative information and it appears 
to require fewer function evaluations than other well-known algorithms I 42 ] . 
Furthermore, the SOP is simple, and it has quadratic convergence. We will 
describe the SOP for the 2-dimensional case, but the generalization to higher 


dimensions is straightforward. 
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Appllcations of the SQP to the risk minimization involves Iterating 
through the following steps: 

1) Initially, six different sets of thresholds are picked, and the 
corresponding sequential risks are calculated. The threshold set having the 
smallest risk is called the base point (denoted by f^) , 2 md the reioaining 
five sets are indexed according to increasing distance from f^, i.e. f^ 

is the closest emd f^ is the farthest from f^. 

2) A quadratic function described by 

u(x) = ^ x'Hx + c'x + u*(f*^) (5-91) 

^ s 

where 

X = f - f° (5-92) 

H is a 2x2 synsnetric matrix, and c is a 2-vector, is fitted through the 
six threshold-risk pairs (with the base point as the origin) . That is, H 
£md c are determined from the equations 

Us(f^) - Us(f°) - j (f^-f°) 'H(f^-f°) +c'(f^-f°), j-l,...,5 (5-93) 

Note that H does not have to be positive definite. The quadratic function 

f 0 5, 

u approximates the risk in a region spanned by if ,...,f ). 

3) The constraint region, K, is defined to be the square region centered 
at the base point with sides parallel to the axes. It is over R that the 
minimization of u(x) will be performed in Step 4. The length of the sides, 
y, is given by 

e 

7T" 


Y ” 2 X .99 X 


< 


(5-94) 
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0 5 

where B is the distance between f and £ , and y is limited by soeke ap- 
propriate Y ' which has the connotation as the maximum step size; K is a 
max 

constraint region reduction factor that is initially set to one but as we 

will see, it may be modified in subsequent steps according to the outcome 

of the minimization effort at each step. In other words, R is a square 

0 

inscribed in a circle centered at £ with radius .99 e K. 

4} The quadratic function u(x) obtained in Step 2 is minimized over the 

constraint region - this is a quadratic program. The square R, in fact, 

has been specified to make the quadratic programming solution procedure siiq>le. 

Let denote the solution. From (5-92) , x** corresponds to a threshold set 

f^ ■ + f^. Since u(x) is quadratic, a solution lying in the interior of R 

M M 

has to be a global minimum. Therefore, x is such a solution if i) x i> 

M 

in the interior of R, ii) H is positive definite, and iii) grad u(x )*0. 
Otherwise, the solution lies on the sides of the square. Aloi^ each side of 
the square one component of X is fixed and u(x) is a quadratic function of the 
remaining free component. Therefore, this is a l-dimensional analog of the 
previous condition, and similar reasoning can be applied to determine if a 
minimum of the l-dimensional quadratic lies on the side but not at the comers. 
There may be a maximum of ionix such minima. The smallest of these will be 
the solution. If no such minimum exists, the four comers of the square will 
be exmnined. The corner giving the smallest u will be the solution. 

5) If U* (f^) < U* (f^), f^ is used as the new base point (and re-labelled 
s s 

as f^) . Five points that are closest to the new base point are selected aikl 
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labelled as according to increasing distance from f^. the cons- 

traint region reduction faster K is set to one, and the procedure is 
repeated starting at Step 2. 

6) If U (f^)< U (f^) , the old base point is kept. Five points closest 
s s 

0 15 

to f are selected and labelled as f ,...>f according to increasing distance 

0 5 M 

from f . Since R excludes the old f (see (5-94 ) ) , f will always be closer 

0 5 m 

to f than the old f is, and f will be included in the five new point. 

This is the mechanism that provides the algorithm with the leaurning from 
mistakes. Hie reduction factor K is set to be smaller than one (say .95), 
so as to limit the searching a little closer to f^. Ihe procedure is repeated 
at Step 2. 

As convergence is approached, the minimum of the quadratic program will 
occur inside the square. The procedure is terminated when it is evident that 
a local minimum has been approached. This algorithm prescribes a sequence 
of quadratic prograans, hence the name SQP. Finally, we note that in theory 
all the thresholds and the associated risks may be kept so that they will 
be availadsle as candidates for the 5 closest points in Steps 5 or 6. In prac- 
tice, however, it is sufficient to store only a few more in addition to the 
active six. 

5.3.2 The Choice of Design Parameters 

There are basically two types of design parameters in the present 
methodology; those affecting the information content of the decision sta- 
tistics and those that play the role of weights in the risk expression. 
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n>« former type includes the window length W and the matrices A and B of 
subsection 5.1.4. The latter type includes the cost functions and the 
prior PMF. He will discuss them separately. 

The window length determines how much data is to be used in forming 
the decision statistics. If the signatures do not vanish, an increased 
window length will impiove the signal to noise ratio, and hence will also 
improve detection parfomumce. For ^le simplified sliding window rule 
(that uses only L^_^(k)}, a long window will cause the decision statistics 
to remember the past too well so that thev bec^ne sluggish in respondii^ to 
failures. However, such an increase in detection delay is absent when a 
full window of decision statistics (i.e. the con^lete t(k)} is \ised as In 
the original sliding window rule (5-8) . 

Viewing the decision problem as a W-sample, non- sequential problem, as 
we have mentioned previously, may cast some light on how to choose H for 
the simplified sliding window rule. From this view point, the choice of H 
becomes determining how many samples should be used. A more siaqplJfied 
situation may be obtained by considering each of the M failure modes separately, 
i.e. we now have M W-sample binary hypotheses testing problems at hand. Then, 
it is clear that W should be chosen such that it is not excessively long but 
still give a high signal (signature) to noise ratio for each laode. In addition, 
W must be large enough so that all failure signatuxes over the window are 
sufficiently different from one another. A reasonable choice for the window 
length is aatue va) ue somewhat greater than all the Ws for the binary hypotheses 
testing problems. With sesne assumed value of probabilities of false alarm 
and detection for the binary hyixjthesee test, reasonable choices of thresholds 
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can ba made. These thresholds iRsyf in turn, be used as initial guesses of 
the thresholds required for the SQP algorithn for the risk-ainisiixation. 

A refinwient of the choice of W can always be made after evaluating the 
performance probabilities of the window rule using the initial choice. 

Recall that the matrix A used in forming the Markov decision statistics 
s also plays the role of a memory parameter, and it may be chosen with the 
same considerations. We will let A be a bit more general here than in 
subsection 5.1.4., i.e. A is now a diagonal matrix with elements between 0 
aiui 1, and the diagonal elements of A may be different from one aixither. A 
diagonal A is used to provide a separate memory for each component of z. 

Consider the i-th conqionent of z 

z^(k+l) - a^z^(k) + b^r(k+l) (5-95) 

where is the ith diagonal element of A and b!^ is the ith row of B. 

The signal to iwise ratio of z is its mean under the ith failure divided by 
its steady state standard deviation. Since an close to one gives * a 
longer memory, should be chosen so that it is not extremely close to one 
and that the signal tc noise ratio reaches a good level in a reasonable time 
i.e. not sluggish (the same issue as using choosing B, however, 

there is not a simple general guideline as we have pointed out in subsection 
5.1.4. Me may, for exanqple, eaploy the batch processing techniqties and take 
the augmented vector [g| (0) , . . . ,g' (v) ] (where v is the batch size) to be the 
i-th row of B as in the exanple of subsection 5.1.4. Generally, each individual 
case will have to be examined separately. Just as in the choice of W, the 
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values of A aifkd B nay be adjusted should the detection perfoznanee be 
not acceptable. 

?he prior PMF nay be chosen according to the reliability of caai^i<mmni.i , 
for instance. In such cases the inverse of the Bernoulli paraneter has the 
slgnliicance as the mean tine to failure. In ether cases it nay be regarded 
as a design paraneter that is used to aid in specifying the tradeoff between 
false Siam and other (expected delay and cross-detection) costs. 

Ilie cost functions are chosen to reflect how undesirable false alaxns, 
delays in detection, and incorrect detections are relative to one another. 
Unli)ce H, A, and B, which affect the information content of the decision 
statistics and the risk in a complex manner, the cost functions enters the 
risk linearly. Hence a change in the cost functions can be acconanodated 
easily without having to re-compute all the performance indices (false alam 
probabilities, conditional expected delays in decision, and conditi<mal 
incorrect detection probabilities) , provided they have been stored. In order 
to arrive at an acceptable design, very often a few sets of cost fuxMtions 
■ay have to be tried. 

5.4 SuBsaary 

In this chapter we have described a Bayesian methodology for designing 
sequential decision rules for FOI. We have examined in detail the three 
step of the design process: 1) the definition of the decision rule structure, 

2i the evaluation of the sequential risk and detection perr ;mance, and 
3) the c)«3ice of design parasieters and risk-nininisation . 
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The suboptimal rules studied are time-invariant rules that partition 
the sample space of the decision statistics into decision regions. The two 
major types of decision rule examined are the sliding window rule and those 
that use the decision statistics z. The computational requir^oent for deter- 
mining different ferns of these decision ^ules has been assesscid. A numerical 
algorithm based on 1-dimensional quadrature formulas was developed to provide 
an approximate evaluation of the risk associated with two simple sequential 
decision rules. The SQP algorithm has been chosen to determine the set of 
thresholds that minimizes the risk. Finally, the issues involved in choosing 
the design peurameters such as the cost functions, prior PMF, window size W, 
euid the matrices A and B used to generate the statistic z, were discussed. 
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CHAPTER 6 

SEQUENTIAI. DECISION RULE DESIGN - A NUMERICAL EXAMPLE 
6.1 Introduction 

In Chapter 5 we described a methodology for designing sequential 
decision rules for FDI. Here, we will apply this design approach to a 
numerical exeimple in order to gain some insights into the nature of this 
methodology. 

In the previous cnepter we discussed the design of simplified sliding 

window rules, i.e. decision rules that use the log likelihood ratios (L ,) 

w-1 

corresponding to the earliest point in the window (Section 5.1.3). Because 

the computation of probabilities associated with such decision rules is 

very difficult, we proposed to approximate L , by a Markov process Z for 

w-1 

the purpose of design and performance analysis only, as this does not re- 
present an implementable algorithm (see Section 5.1.3). A quadrature 
algorithm based on Gaussian quadrature formulas was developed for ccxuputing 
the probabilities associated with decision rules using Markov statistics- 
(Section 5.2.2). Thus, this algorithm can be used to calculate the pro- 
babilities, and hence, the risk associated with the statistic Z. Such a 
risk provides an approximation to the risk associated with a sliding 

window rule (usina L ,). As a practical alternative, we proposed the use 

w-1 

of ai implementable Markov statistic z in place of ^ in the decision rule 

(Section 5.1.4). The advantages of using such rules are that the statistic 

2 is easier to ccffiipute than I and the quadrature algorithm can be applied 

w-i 

directly (since z is Markov) . Finally, the design of decision rules was 
formulated as the choice of a set c:f thresholds that minimizes the risk, 
and the SQP algorithm ^Section 5.3-1) was proposed as a means for performing 


the risk minimization. 
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Through an exercise of these design concepts and simulation studies, 

we can 1) assess the accuracy of the quadrature algorithm for computing 

(detection) probabilities, 2} determine if the approximation of the sliding 

window decision statistic L . by the Markov statistic I is reasonable, 

w— 1 

3) gain some experience with the SQP algorithm for minimizing the risk func- 
tion, and 4) compeure the performance of decision rules using the sliding 
window statistic with that using the simpler Markov statistic z . These are 

the goals of studying a numerical exai^le. We will describe the set-up of 
the ntanerical problem in Section 6.2, and we will discuss the results in 
Section 6.3. 

To facilitate discussions, we will introduce the following terminology. 

We will refer to a simulation of the sliding window rule by SW, a simulation 
of the rule using the Markov statistic z as Markov implementation (MI), and 
a simulation of the nonimplementable decision process using the approximation 
I as Markov approximation (MA) . 

6.2 The Numerical Example 

In the numerical example, we will consider the detection and identifi- 
cation of two possible failure modes (without identifying the failure times) . 

We assume that the residual is a 2-dimensional vector, and the vector failure 
signatures, g^(t), i=l,2, as functions of the elapsed time t are shown in 
Table 6-1. Ttie signature of the first failure mode is simply a constant 
vector. The first component of ^ 2 ^^^ ® constant, while the second 
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canponent is a ramp. We have chosen to examine these two types of sig- 
nature behavior (constant bias and ramp) because they are sii^>le aisi 
describe a large variety of failure signatures that aure c<»Bionly seen in 
practice (see, for example [9]). A constant bias represents a constant 
failure effect on the residuals and such a sig.'ature often occurs in 
practice (e.g. in the detection of biases in sensors) . Also, constant 
signatures can be used to appiroximate a slowly changing signature, while 
a ramp can be used to model failure effects that become more noticeable 
as time progresses. For simplicity, we have chosen V, the covariance of 
r, to be the identity matrix. 

We will design both a simplified sliding window rxile (that uses 1^2.^ 
a rule using the Maricov statistic z. In the remainder of this section we 
will discuss the choice of design parameters such as W, L, etc. 


g^(t) 


1 ■ 
5 


92 (t) 


.5 

.25+ .25t 


V 


1 

0 


0 

1 


TABLE 6-1; Failure signatures. 
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Recall in Section 5.3 that the window size W of a sliding window rule 

should be chosen so that a reasoned^ly high signature-to-noise ratio 

(denoted by r|^) for each failure mode is attained. That is, is the 

ratio between the expected value of ^ (k) , given that a type i failure has 

occurred at k-W+1, and the standard deviation of the i-th component of 

From the expressions for L , given in (5-9) (5-10) , it is easy to see that 

w— 1 

= Zq (i,i) (6-1) 

where Z (i,i) is the (i,i) element of Z , the covariemce of L For our 

0 0 

problem, an r| of better than 3 can be attained for each failure mode by 
using a window size of 8, and we will consider simplified sliding window 
rules with W=8. The approximation (£) of is given by (5-32) 

£(k+l) = A £(k) + B C(k) 

The values of A, BB', euid Z^ cem be directly determined using (5-33)- 
(5-41), euid they are shown in Table 6-2. 

The Heurkov statistic z is given by (5-49) 

2 (k+1) = A 2 (k) + B r(k+l) 

In order to achieve roughly the same memory span for and a for this 

problem, we have chosen A to be a diagonal matrix with both diagonal 
elements equal to .875 (which roughly gives a time const 2 uit of 6 steps for 
z) . The first row of B is set to be [1,.5], i.e. g^, because to detect a 
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type 1 failure we have to look for in the residual. The first con^nent 

of is a constant (*.5). Therefore, the (2,1) eleoent of B is set to be 

.5. The second component of g^ grows with the elapsed time. In order to 

exploit this behavior the (2,2) element of B has to l)e a relatively large 

number. In this example, we choose it to be 2, the value of the second 

component of g^ at a elapsed time of i* . The values of A, B, and the steady 

state covariance (E } of z are summarized in Table 6.3. 
z 

Recall from Section 5.1.3, the decision regions we have chosen take 
the form (5-24) 

S. = a -(k): 
j w-1 

L(k;j,w-l)>f^ 

e"^(j,w-l) [L(k;j,w-i)-f .]>e‘^(i,w-i) (i.(k|i,w-l)-f^J , 

ii'j} 

- {L, ,(k): L(k;j,W-l)<f ., j=l,2} 

where e (j,W-l) is the standard deviation of the j-th component of the 
statistic I , i.e. e{j,W-l) =Z (j,j). For the decision rule using 

Q 

z, we only need to substitute z for and the standard deviation 

E * (i,j5 for e (j,W-l), Using the data contained in Table 6-2 and 6-3, 
z 


* By choosing a large value for this ccanponent we are, in scane sense looking 
for a large bias. This means that in using the resulting static we will have 
difficulty in detecting this failure at or shortly after the onset, because 
the ramp component will he small. A good choice of B will depend on the sig- 
natures of all the failure modes exrjnined as a whole as well as the performance 
tradeoff prescribed by the use functions. As we have pointed out in Chapter 5 
the design of A and E represents an interesting open problem. 
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TABLE 


W - 8 


A 


lo 


BB' 


826 

.058 ■ 


116 

.837 


[10 

8.5 

18.5 

14.75 . 


'2.32 2 . 01 ' 

.2.01 4.58. 


6-2: Puameters for L, , and Jl. 

w-1 



0 

.875 . 


1 

.5 


.5 

2 



• 5.33 
. 6.40 


6.40 


I8.13J 


BVa' = 


■ 1.25 
. 1.50 


1.50 

4.25 


TABLE 6-3: Parameters for 2 
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the decision regions corresponding to rules using *nd t eure depicted 

in Figures 6-1 and 6-2 respectively. The decision rule design problem 
consists of choosing a set of thresholds (f^ and f^) that minimizes the 
risk. 

The cost functions and prior probability mass function used in this 
exan®)l" are slKsvm in Table 6-4. All incorrect identification of failures 
modes are penalized with 10 units, while correct failure mode identifications 
are not penalized. False alarm cost is 9 units- false alarms and incorrect fail- 
ure nK>de identifications are nearly equally undesiracle. The delay cost for both 
failure inodes is chosen to be one. Both failure inodes are assumed to be 
equally likely, and the mean time to failure is 5000 steps. The prior pro- 
bability y is fixed by this a priori information. Although it is not 
done in this study, we note that if the decision rules resulting from the 
present values of the peuraroeters L, C, p, W, A, and B are unsatisfactory, 
these parameters may be adjusted to get a new design. 

Recall that the risk is an infinite sum over t, the elapsed time 
.5-B3 ) . In Section 5.2.3 we proposed to approximate it as a finite sum 
(i.e, the original infinite sum truncated at t=A) jIus a remainder term. 

For dll decision thresholds considered in the present exaiq>le, the value of 
A is chosen to be 8, vdiich is large enough so that the remainder term is 
small, but small enough so that the computational load (due to the propa- 
gation of the conditional density) remain manageable. 

The steady state conditional density, given that no failure has 


occurred and no false alarm has been declared, is approximated by the 
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S* * ^ 

L(l,2) - L(2,l) 10 

L(l,l) = L(2,2) - 0 

Cl » * 1 

y(i,T) = .5p(l-p)^"^, i»l,2 
p = .0002 

TABLE 6-4; Cost Functions and Prior Probability. 
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conditlonal density propasjatcd 8 steps. That is, we assume ^8 (see 
Section 5.2.3). Experimentation with larger values of T indicated that 
the detection probabilities are not significantly changed and this ap- 
proximation is a reasonable one. To maintain consistency, T is kept at 8 
for all thresholds examined. 

In order to obtain accurate results, we have to use as large an n^^ 
and n (the number of points in the .. guerre and Hermite formulas) as is 
practical in the quadrature algorithm. In designing decision rules for 
this example, we will use n “ n » 20. Wit' A*8 and T»8, the evaluation 
of the risk for each threshold pair taJces approximately 6 minutes of CPU 
time on XBM370/168 computer. 

Recall (Section 5.3.1) that the six threshold pairs required to start 
the SQP algorithm may be chosen arbitrarily. No set rule is the best for 
all applications. Here, we will arbitrarily choose these thresholds to 
take on values within a range of threshold values. They are limited to be 
positive and within 2 to 4 standard deviations of the decision statistics. 

Ne t, we will discuss the results of applying our design method to 
this example. 

6.3 Results outd Discussion 

In this section, we will describe our experience with the decision rule 
design methodology through its application to the example introduced in the 
preceding section. We indicated that there are four main aspects of the 
design approach (see Section 6.1) that need to be examined. We will discuss 


them In the following. 
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Accuracy of th< guadratura algorithn 

The quadraturu algoziUn for oonputing probabilltlaa is baaed on 
Markov decision statistics. In ordor to determine its accuracy, we have 
to cmapare the probabilities associated with a decision rule using Markov 
statistics (e.g. z ) as computed by this algorithm to those obtained via 
Monte Carlo simulation of the sane decision process. 

A convenient set of probabilities to be examined include b^(t|i) 

(5-82) and 0^(t|i) defined by 

b^(t|i) - Pr{z(t)es^, z(t-l)eSQ,...,z(0)eSQ|i}, i,j-0,l,2 (6-2) 

t 

B.(tji) - I b.(sli), i-0,1,2, j-1,2, (6-3) 

J 8-0 ^ 

That is, b^(t|i) is the probability of continuing sas^ling up to elapsed 
time t and choosing decision j at t, given i is the true fail\ire mode, 
and B^(t]i) is the probability of stopping to declare a j-th failure at or 
before elapsed time t, given i is the true failure mode. Another indicator 
of the accuracy of the qxiadrature algorithm as discussed in Section 5.2.2 is 

T 

2 

P, ■ I 

j-0 

where 

P^(j) » Pr{z(t)€S^lz(t-l)eSQ z(0)eSQ,i} 


(6-5) 
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is the conditional probability of choosing j at t, given we have 
been sampling through elapsed time t-1 and the true failure mode is i. 
Therefore, if the algorithm is valid, will be close to 1. 

In Figures 6-3 to 6-7 we have shown b^Ctlo), b^Ctll), b^CtU), 

6^(t|l), and respectively for a decision rule using the statistic 

z (as described in Section 6.2) with thresholds f * [6.267, 11.867]'. 

Plotted against the elapsed time t, b^(tlo) shows the failure characteristics 
of the decision rule - a slowly decreasing b^(t|o) Indicates that a low 
false alarm rate is achieved. The rate of decrease of b^(tii) , i~l,2, 
indicates the speed of response of the decision process to failures, and 
$^(t|i), i“l,2, shows the ability of the decision rule to identify the 
failure correctly. 

The results obtained via Monte Carlo simulation using 10,000 trajec- 
tories are marked as MI (see terminology defined in Section 6.1). The 

(quadrature results using n ■» n - 20 are marked as Q20. The quadrature 

li H 

results using n “ n ■ 14 is also shown in the above figures and they 
L H 

are marked as Q14. 

Generally, the quadrature results Q20 are quite close to the simulation 
results MI, while the quadrature results (^14 are not. This shows that the 
quadrature algoritlun may be used to provide a reasonable approximation of 
the probabilities, and the accuracy of the approximation can be improved 
by increasing the number of grid points used in the algoiithm. The value 
of P^ for Q20 ranges fr<Mn .998 to 1.05, while that for Q14 ranges from 
.994 to 1.2. Therefore, the value of P^ also indicates that Q20 will 
provide a close estimate of MI and that Q14 will not. 
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yiGURE 6-3 ; b^(tjo) - Usinq z. 
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Now we caa^xe Q20 with MI. Fr«n the trend of bQ(t|Q) shewn in 
Figure 6-3, it is evident that Q20 will under-estimate the false alarm 
rate of the decision rule. Both bQ(til) and bQ(t|2) indicate that the 
quadrature results under-estimate the speed of detection. Q20 also slightly 
under-estimates the correct detection probabilities B^(tji). On the whole 
Q20 has provided a reasonably good estimate of the true probabilities 
associated with the decision rule using ti.e Markov statistic * and threshold 
16.287, 11.867]. 

Finally we note that if we use the quadratiure calculations for 
comparisons between different rules (e.g. fr- optimization purpose, or 
just to assess the effect of increasing W, adding a sensor, etc.), then 
small quadrature errors will probably have a small effect on the result. 

When the quadrature errors are consistently in the same direction, e.g. if the 
c[uadrature approximation consistently under-estimates false alarm probability 
and over-estimates detection delay, the relative performance of two decision 
rules (i.e. which is better than the other) will probably be correctly 
determined by the approximation method. 

The Markov approximation Z 

Here we will describe a conclusion drawn from a simulation (SW) of 
the sliding window rule with f = {8.849, 12.047]' and a simulation (MA) 
of the nonimplementable decision rule based on Z ?.nd using the sane 
thresholds. (Each simulation consists of 10,000 trajectories.) The 
resulting prolaabilities are shown in Figures 6-8 to 6-12. In addition, 
the probadjilities associated with the »3arkov approximation (f.) ^ re 
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FICTIRE 6-12 ; b^itll) - Sliding Window Rule and A|^roxiMticm. 
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cooqputad via the quadrature method using <■ 20, and tiiese are in- 

cluded in the above figures. Cniese probabilities are also marked Q20) . 

SW may be conq>ared with HA to determine the validity of the Markov ap- 
proximation t, and MA may be compared with Q20 to further assess the ac- 
curacy of the quadrature algorithm. 

From the simulation results (Figure 6-8) it is evident that the Markov 
approximation (MA) slightly under-estima tern the false alarm rate of the 
sliding window rule (SW) . However, the response of the Markov approximation 
to failures is very close to that of the sliding window rule (see Figure 6-8 
to 6-12) In the present example, . is a 7-th order process, while its 
approximation I is only of first order. In view of this fact, we can con- 
clude that £ provides a very reasonable and useful iq>proximation of 

The quadrature results Q20 are very close to MA. The value of 
ranges between .998 and 1.03. This is further evidence that the qu^rature 
algorithm is useful for obtaining estimates of pr^abilities. Furthermore, 
this indicates that the results of applying quadrature to the Markov ap- 
proximation provides a good approximution of SW. Thus one overall conclusion 
is that the quadrature technique for calculating proximate performance 
using the Markov approximation to the sliding window test represents an 
useful and accurate method for determining the performance of failure detec- 
tion rules and for comparing and optimizing such rules. We now turn to the 
last of these possibilities. 

The SQP algoritlBB 

The SQP algorithm is used in conjunction with the quadrature al-orithm 
(n. ■ n„ ■ 20) to find the risk-minimizing thres).old for both the sliding 
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wlndow rule and the rule using a. The successive choices of thresholds by 
SQP for the tv«o decision >*ules are plotted in Figures 6-13 and 6-14. 'ftte 
performance indices, such as the estimated mean time between false alarms 

I the detection delays (t^) , and the correct detection probabilities 
(P(i |i)). along with the risks associated with these choi<je» cl thresholds 
for both decision rules arc shown in Tables 6-5 and 6-6. 

Note that we have not carried the SQP algorithm far enough so that 
the successive choices of thresholds are, say, within .001 of each other. 

In Tables 6-5 and 6-6, it is evident that towards later Iterations the per- 
formance indices become relatively insensitive to small changes of the f's. 
This together with the fact that wo are only computing an a|^/*oximate Bayes 
risk (see Subsection 5.2.3) means that fine scale optimization is not 
worthwhile. Therefore, with the approximate risk, the SQP is most effeciently 
used to locate the zone where the minimum lies. That is, the SQP algorithm 
is to be terminated when it is evident that it has converged into a i eason- 
ably small region, such as in the present example (see Figure 6-13 and 6-14). 
Then %/e may choose the thresholds that give the smallest risk as the 
af^roximate solution of the minimization. 

In the event that thresholds that yield the smallest risk do not pro- 
vide the desired detection performance, the design parasteters, L, a, M, and 
W may be adjusted and the SQP may be repeated to get a new design. A prac- 
tical alternative method is to make use of the list of performance indices 
(Tables 6-5 ^md 6-6) , that are the by-product of SQP, and choose a pair of 
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ITER RISK MTBFA PCjl) P(2l2) 



8.0 

11.0 

8.9029 

359 

.868 

6.29 

.821 

5.71 


9.5 

11.5 

8.9282 

963 

.793 

8.24 

.930 

6.10 


9.0 

l .' i .5 

8.8856 

976 

.917 

7.75 

.841 

6.43 


8.5 

13.5 

8.5418 

744 

.973 

7.16 

.699 

6.65 


9.75 

13.0 

8.9742 

2014 

.909 

9.05 

.888 

6.74 


8.75 

11.75 

8.6818 

676 

.879 

7.28 

.658 

6.10 

1 

7.563 

12.654 

8.9433 

330 

.971 

6. 00 

.620 

6.10 

2 

8.840 

10.729 

8.9091 

510 

.768 

7.14 

.914 

5.72 

3 

8.616 

12.110 

8.8842 

671 

.912 

7.17 

.821 

6.2.1 

4 

8.748 

11.902 

8.8809 

703 

.891 

7.30 

.649 

6.16 

5 

8.801 

11.978 

8.8803 

743 

.89.3 

7.39 

.850 

6.20 

6 

8.825 

12.028 

8.8802 

766 

.695 

7.43 

.850 

6.22 

7 

9.180 

12.740 

8.8871 

994 

.875 

7.93 

.882 

6.29 

8 

8.849 

12.047 

8.8801 

783 

.895 

7.46 

.851 

6.23 


8.867 12.039 


TABLE 6-5 8 Performance of Sliding Window Rule with Thresholds 
Chosen by SQP. 
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ITER 


^2 

RISK 

MTBFA 

PdlD 

h 

P(2|2) 

*^2 


6.4 

12.5 

8.9389 

761 

.922 

7.60 

.772 

6.41 


7.2 

13.25 

9.1026 

1786 

.890 

9.55 

.869 

6.82 


6.0 

11.8 

8.9290 

443 

.914 

6.74 

.739 

6.04 


7.0 

12.0 

8.9947 

951 

.838 

8.74 

.883 

6.40 


5.5 

11.0 

8.9389 

252 

.907 

5.03 

.691 

5.61 


5.7 

12.2 

8.9406 

384 

.947 

6.29 

.651 

6.03 

1 

5.462 

12.940 

8.9643 

344 

,975 

5.97 

.541 

6.09 

2 

5.975 

10.996 

8.9311 

340 

.869 

6.56 

.781 

5.78 

3 

5.951 

11.528 

8.9289 

395 

.903 

6.62 

.746 

5.94 

4 

5.776 

10.771 

8.9337 

284 

.872 

6.21 

.759 

5.64 

5 

6.089 

11.667 

8.9279 

454 

.901 

6.88 

.763 

6.03 

6 

6.117 

11.545 

8.9279 

445 

.891 

6.90 

.775 

6.04 

7 

6.287 

11.867 

8.9289 

563 

.897 

7.27 

.787 

6.16 


6.158 

11.635 








TABLE 6-6 ; Perform 2 mce of Decision Rizle Using z with Thresholds 
Chosen by SQP. 
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thresholds that yields the desired performance. Usually the list of 
perfomumce indices provides sufficient information for deducing such a 
pair of thresholds heuristically. This choice of thresholds may be 
refined after its performance is determined via the quadrature algorithm. 
This approach will save the computations required to apply the SQP the 
second time (i.e. after we have adjusted the design peurameters L, c, euid M) . 

Sliding window rule vs decision rule using z 

Here, we will con^re the performance of a sliding window rule with 
that of a rule using z. We will consider these rules with thresholds de- 
termined with SQP based on the same cost functions and prior probability 
as described in Section 6.1. The thresholds for the sliding window rule 
are [8.849, 12.047] (the 8-th iteration of SQP, Table 6-5) , and the thres- 
holds of the other rule are [6.287, 11.687] (the 7-th iteration of SQP, 

Tedsle 6-6) . 

Probetbilities for these two decision rules based on simulations of 
10,000 trajectories are shown in Figures 6-15 to 6-19. In fact, these are 
the same results listed in Figures 6-3 to 6-12. Here, SW and MI are plotted 
on the same graph to facilitate the comparison. We note that MI has a 
higher false alarm rate them SW. The speed of detection for the two rules 
is similar. While MI has a slightly higher type-1 correct detection 
probability than SW, SW has a consistently higher ^(t |2) (type -2 correct 
detection probability) than MI. (Also see Tables 6-5 and 6-6). 

Based on the results (Table 6-5 and 6-6) we can make the following de- 
duction. By raising the thresholds of the rule using z appropriately, we 





10 


FIGURE 6- 17; bQ(t|2) 


- SW and MI. 



L 





FIGURE 6-19 ; S2^tl2) - SW and MI. 
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can decrease the false alarm rate of MI down to that of SW with an increase 
in detection delay and slightly improved correct detection probability for 
the type-' failure (with raa^ signature). Thus# the sliding window rule !• 
slightly superior to the rule using z in the sense that when both are designed 
to yield a comparable false alarm rate, the latter will have longer detec- 
tion delays and slightly lower correct detection probability (for typer2 
failure) . In view of the fact that a decision rule using z is imich sisqpler 
to implement, it is worthy of being considered as an alternative to the 
sliding window rule. 

In summary, the result of applying our decision rule design method to 
the present example is very good. Ihe quadrature algorithm has been shown 
to be useful, and the Mar)cov approximation of by J, is a valid one. 

The SQP algorithm has demonstrated its sing>llcity and usefulness through 
the numerical exanple. Finally, the Markov decision statistic z has been 
shown to be a mrthy alternative to the sliding window statistic 
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CMAPTER 7 

SUItWRY AHD RECOMMEWDATICMIS 

7.1 Suiwnary 

Thie goal of the research reported in this thesis is to develop a 
methodology for designing FDI systems that deliver good performance and 
are robust in the presence of modelling uncertainties. We have viewed 
the FOI process as consisting of two stages: a residual-generation process 

followed by a decision process. Since modelling errors affect residual- 
generation directly, the robustness issue is most effectively tackled in 
the design of this process. Naturally, the issue of detection performance 
is the main concern in designing a decision rule. Therefore, the FDI 
design problem is decongx>sed into two tasks: the design of a robust 

residual -generation process and the design of a high performance decision 
rule. 

Analytical redundancy is the basis for residual-generation. In 
Chapter 2, we presented a general formulation of the concept of anal;, xcal 
redundancy for LTI systems in terms of a parity space. A redundancy relation 
is simply a parity relation, which has to hold in the absence of a failure 
and noise. When such a relation is violated, a failure is evident. The use 
of parity functions (or parity vectors) as residuals for FDI was also 
extensively discussed. 

In the presence of modelling uncertainties and noise, the parity 
relatiorsof the system also become uncertain. Chapter 3 was devoted to the 
development of an approach for determining useful parity relations for FDI. 
The crucial probl*?fi of determining a set of appropriate coefficients for a 
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parity relation waa formulated as a nininax problem^ the objective of 
which was to minimize the worst case effect of noise and modelling error 
on the parity relation. Tlierefore, residual-generation ba&id m such 
parity relations is robust (or as robust as any relation can be for the 
particular system under consideration) . The notion of signature-to-parity- 
error ratio was also Introduced to aid in the choice of parity functions 
for residual-generation. 

The contribution of Chapters 2 and 3 rests on the precise charac- 
terization of analytical redundancy as parity relations and the fonmilation 
oi the parity coefficient design problem as a minimax optimization. These 
concepts have formed the basis of a new approach to the design of robust 
residual-generation processes. Further development of this design method 
is possible, and m will discuss some of the future research directions in 
Section 7.2. 

nie design of a decision rule in^ralves resolving the tradeoff asong 
the various detection performance issues. In this research we followed 
the Bayesian approach. In Chapter 4, w«* formulated the FDI decision 
process as a Bayes sequential decision problem. The cost functions and the 
prior probability mass function of the Bayes method could be regarded as 
parameters prescribing the tradeoff among the various performance issues. 
Although the optimal Bayes rule cannot be iiig}lemented, this formulation 
provides a structure from idiich simple subopt Imal rules can be constructed. 

In Chapter 5, we discussed s<»ne suboptimal decision rules that are 
based on the Bayes rule as well as other suboptisial rules. Just as in the 
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deaign of • Bay«s rula, the design of a suboptimal rule was also forstulated 
as a sk-mlnlnizing problem, a quadrature algorithm was developed to 
cos^te the detection probabilities and the risks associated with low dimen- 
sional problems. Thus, with a minimisation algoritiun that does not require 
derivative information, such as the SQP, the suboptimal rule design problem 
may be solved ntmerically. 

This design methodology was applied in a numerical exa8g>le. The 
results (discussed in chapter 6) indicate that this approach is a valid 
and useful one. We also note that the limitations on the dimensionality 
imposed by cosg>utational considerations need not lead to a corresponding 
severe limitation in the applicability of our technique. Specifically, 
our work in Chapter 2 and 3 was aimed at breaking up the dynamics of a 
system into low-dimcnsional pieces in order to isolate robust parity 
relations. Thus we see that using low-dimensional decision statistics 
serves two purposes: it allows us to address the issiie of robustness and 

it allows us to apply our decision rule algcrithn. 

7.2 Future Research Directions 

In the course of this study, a number of open problesis have been 
encountered, and they were mentioned in the text of this thesis. Some 
directions for future researcli based on these problems are outlined in 
the following: 

1) Ir aeCbwA 3.5, vm described the solution to a special 
case of the minimax parity coefficient design problem. A solution 
procedure for more general cases is needed. 
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2 ) In Section 3.3 we indicated that if wo postulate a PDF 
over r, the parity coefficient design problem can be re-formulated 
as a minimication problem, which is imich sin^ler to solve. This ap> 
proach should be exasiined in the future. 

3) Because a parity relation with a large signatur«-to-parity- 
error ratio (ir) is desirable for PDt, we may re-define the oojective 
of the parity coefficient design problem so that we consider 

max min iT(a,6,Y.x ,u ,i) 

a,$ yer ° ° 

s.t. oa'-l 

where i denotes the failure of interest. This problem is generally more 
difficult than the minimax problem, because the objective function x is 
more coo^lsx. 

4) The parity coefficient design procedure ex^unined in Chapter 3 
yields parity relations tnat are most suitable for robust open- loop 
residut.l-generation. The problem of determining parity coefficients 
(relations) for robust closed-loop residual-generation should oe addressed. 

5) In the present study, we did not consider in detail how to 
choose a set of "best" parity functions as measured by x or e* (the parity 
error) for the PDI of a given set of failures. A systematic method for 
selecting this set of parity relations is a useful tool to be developed in 
the future. 

* } The detection performance indices (such as correct detection 
probabilities) , associated with the decision rules of Chapter 5 are based 
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on exact characterization of failure signatures. The effect of modelling 
error is not accounted for. It vrill be useful to make use of information 
such as n to couple the effect of modelling uncertainties into the detec- 
tion performance indices. Then we can consider designing decision rules 
that minimize the risks based on the modified performance indices. 

7) The quadrature algorithm developed in Chapter 5 provides reason- 
able estimates of the probabilities. However, it consumes too much com- 
putation time. An improvement of this algorithm aimed at reducing 
coit 5 )utations is desiradjle. For example, we may consider a better placement 
of the grid points of the quadrature so that fewer points will be needed. 
V7ith reduced computational requirement, the quadrature algorithm may be 
used for higher dimensional problems. In addition, the utilization of other 
1-dimensional quadrature formulas in place of the Laguerre and Hermite 
formulas used in the present quadrature algorithm should be explored in am 
effort to arrive at a more efficient integration formula that is applicable 
to higher-dimensional problems as well as the 2-dimensional case considered 
here. 

8) In Chapter 6, the ( implementable) Markov decision statistic z 
was shown to be simple and useful. In order to generate such a statistic, a 
choice of the matrices A and B is needed. A procedure for selecting these 
matrices for high detection performance is needed. 

9) More experimentation is needed to confirm the general conclusions 
of our srudy of the decision rule optimization algorithm. 
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APPENDIX A ; Solution Procedure for the Minimax Problem (3-51) 

Consider me optimization problem 

min max [f (x),f (x)] (A-1) 

xex ^ ^ 

r\ 

where iK . Both and are continous over X which is a connected 

subset of R*'. For each f., there is a subset T. of X such that for any 

11 

point ariy point xex, That is, contains the 

global minima of f^ over X. We will assume that f^ has no other local 
minima. 

We can show that the solution to (A-1) can be determined as follows: 
1) By defining 

h(x) = meuc[f^(x) .f^Cx)] 
we can re-write (A-1) as 

min h(x) (A-2) 

xex 

Let n be the set 

n = {x: xer. and f.(x)> f, . (x) , i=l,2} 

It is clear that when fi is not empty, it contains the minimum of h over X. 

Assuming 12 is not empty, the solution of (A-2) is simply the element 

x*el2 such that 

h(x*) = min h(x) 
xel2 


(A-3) 
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Note that X* iDay not be unique. The solution of (A-3) is easy to compute 
if each has a single global minimum, because ^ will contain at most 
two points and the search required in (A-3) is extremely simple. When 
n is empty, the minimax solution can be found using the next step. 

2) Define 

A = {x; xex and = f 2 (x)} (A-4) 

C ““ c 

and let A denote the coroplement of A in X. Consider an element x e A . 

Because f^^ and f^ are continuous and X is connected, we can find a neighborhood 
W around x in A° such that either fj^(x)>f 2 (x) or f 2 (x)>fj^(x) for all xsW. 

Without loss of generality, we can assume fj^(x)>f 2 (x) in W. If a solution 
is not found by using step 1, we only need to consider x ^ T^. In this case, 
there is some other point x^ in W (hence in A^) such that f (x®) <f (x) 

(since has no local minima). Therefore, x cannot be a solution of (A-2) 
and does not contain the solution. The solution must lie in A. In this 
case, (A-1) becomes 

min fj^(x) (A-5) 

s.t xex 

f^ (x)-f 2 (x)=0 

and we have a constrained minimization problem. Note that the objective 
function of (A-5) may be replaced by f^Cx). 
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NOW we will apply the above result to the minimax problems considered 
in Section 3.5. First, consider (3-51) , where fj^, and X are 


f^(a') = as(Y )a' , 


i=l,2. 


X = {a' ; o«x'=l} 


where S(y^) is an n*n symmetric positive definite matrix and a is a (row) 
n-vector. It is well-known that S(y^) has a complete set of orthonormal 
eigenvectors. The minimum value of f^ is clearly the smallest eigenvalue 

of S(y^). Now we will show that f^ has no local miniuium other than the 

global ones. It is clear that the eigenvectors S(Y^) represent 

all possible local minima of f^ over X. Let y^^ correspond to the eigenvalue 

o of S(Y^) which is greater than the smallest eigenvalue a . (which 

1 min 

associated eigenvector y . ) . By tedcing a' = ay, + by . with b^O and 

mm 1 xnin 

2 2 2 2 

a +b =1, we have f . (a' ) = a a, + b o , < o, . Thus, y, cannot be a local 

1 1 min 1 1 

minimum and f, has no local minimxim other than y . . Consequently, the two- 
1 ■'min 

step solution technique is applicable to (3-51) . 

When the smallest eigenvalue of S(Y^) is not repeated, f^ has one global 

minimum at y . . (Due to symmetry, we can consider y . only and not - y . .) 

■'min min min 


If this is true for f^^ and f^, Q has at most two points and the solution 

of (A-3) (i.e. in step 1) can be readily determined. When a . of S(Y^) 

mxn 

is repeated, we can also show that we only need to consider at most two points 
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of n in step 1. Suppose is of multiplicity m, and we let Y be the 

matrix of the m eigenvectors associated with a . . (Note that Y is not 

min 

unique, but it has rank m.) Then 

= {a'; a' => Yz, where zBr" and z'zsl} 

To determine if we can find an a' 6 such that f^ (o' ) >f^ (ex' ) , it is 

only necessary to check if f^(a')^ f 2 (ot') where a* is the solution of 

min aS(Y^)ct 

a 

« 

s.t. a* = Yz 
z'z'=i 

Equivalently, a' = Yz, where z solves 

min z' [ y'S(Y^)Y]z (A-6) 

z'z*l 

2 

The solution to (A-6) is, of course, the eigenvector of Y'S(Y )Y (which is 

mxm, symmetric, positive-definite) associated with the smallest eigenvalue. 

— 2 
Note that z may not be unique (due to repeated eigenvalues of Y'S(Y )Y), 

amd hence, a' may not be unique, since all such a' are equivalent 

(give the same value f(a')) , we only have to consider one of them. As a 

result, there will be at most two points in n (which may be empty) in step 1. 

Next, we will apply the above solution procedure to the case where the 

parity structure contains actuator inputs. The minimax problem is of the 


form 
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min max 
Ye[Y^,Y^] 


[a X] S(Y) [a X]’ 


(A-7) 


s.t. aa'=l 


where S is the symmetric positive definite matrix given by (3-30) and X 
is a scalar. Assuming S is quadratic in y (a scalar parameter) , (A-7) 
is equivalent to 


min max [aX]S (y^) [aX] ' , [aX]s (y^) [aX] ' 


(A-8) 


s.t. c«x'=l 


with X = { [aX] ' : aa'=l}. Therefore, the 2-step solution procedure described 
above applies if we can show that [aX] S (y^) [aX] ' has no local minima other 
than the global one over X. According to multiplier theory [14 ] , the 
necessary condition for a local minimum of [aX]S(y^) [aX] ' to exist over X is 


[a X] 


®11 ®12 


®21 ^22 


= 6[a 0] 


(A-9) 


where we have shown S in the partitional form, and 0 is a non-zero scalar. 
Since S is positive-definite, (A-9) can be re-stated as 


“ ^®11 ” ®12 ®22 ® 21 ^ 


■®22 “ ®12 


(A-10) 


(A-11) 
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Therefore, in studying the minima of laX] S (y^) ICxA] ' we only need to consider 
X of the form tA-11). Using we have 

[aX]S(Y^) taX] ’ = 

Then, we can readily deduce that the global minimum is given by [a, X(a)], 
where a is the eigenvector of fSj^iL”®22^22^21^ (symmetric, positive-definite) 
corresponding to the minimum eigenvalue and X (a) is give"^ by (A-11) . Moreover , 
[oX]S (y^) [aX] ' has no local minima, because ^f^j^i~®2.2^22^21^^ local 

minima. 
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APPENDIX B: The Convergence of J. (0) to J..(0) 

K 


Let (^*(lfr(l) ) , . . . (k>r (1) , . . . ,r (k) ) , . . .) be the optimal stopping 

rule for the non-truncated sequential problem. Define 
$ ((^ '*^(l,r(l)),...,0 '^{kir{l) , . . . ,r(k) ) , . . .) such that 


<|)*'*^(k!r(l),...,r(k))= f (k;r(l) , . . . ,r(k)) , k=l,...,k-l 


(|> '^(K,r(l) ,...,r(k)) = 1 


I I 


*,K * 

That IS, $ is the same as $ , except the former imposes mandatory stopping 

*,K * * 

at time K, and ^ (k;r(l) , . . . ,r (k) ) , k>K is arbitrary. Since ($ ,D ) is 

optimail for the non-truncated problem, we have the difference A: 


*,K * * * 

A = U ($ ' ,D ) - U {$ ,D )> 0 

S 8 


(B-l) 


Furthermore, from (4-16), we have 


M k 


Ull I Ip'(kji,T) I E ip*(k;r(l),...,r(k)) (L((i,T),d*(K;r(l),...,r(K))) 
i=0 T=1 k=K 


+ C(i) (k-T)] 


M » 


+ I Iy(~,i,T) I E U)*(k;r(l),...,r(k))lL((i,T),d*(k;r(l),...,r(k)))l 
i=0 T®1 k“K 


+ E. J; (k;r(l),...,r(k))c(i) (k-T) 

k= 

m£Jc[T,k] 


(B-2) 


Note that in order for a sequential risk to be finite (as is true with 
00 

U (4>*,d'*)), J E ij;(k;r (1) , . . . ,r(k) ) 0 as and 

® \ V I » T 


k»K 


I 
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^ k B. \j»(k}r(l) , . . . ,z(k) ) must bo flnito. (Koto that in (B-2}> o(0)«0 
k-1 

bocanso thoro is no dolay coat if no failitfo has oociasirod* ) Sinpo b and o(i) 
arc bounded, the two toxns in (B-2) that are duo to L wA the toxa to 
c(i) (K-T) vMiish as IM. Bow eofwidor tho rwainiag ton in (B-2): 


M « » 

I I I U*(“;i.T) I B J>*(kjr(l),...,r(k)>c{i) (k-T)| 

i«0 T-0 k- 

aax[T,k] 


M • » 

< I ^ u»Cji.x)c<l)T I E V(k;ra>,,..,r(k)) 

i-Q T-0 k-K 


M • » 

I X yC»i,T)c(D J k B. _)^*(k>z(i),...,r(k)) (B-3) 

i-0 T»1 k-K 


From the prior discussion both tens on the right hand side of (k’3) vanish 

* K 

as k^. Therefore A-K) as Tho fact that ♦ ' belongs to tho class of 

stopping rules that teninate sampling at or before K implies 

Using (4-44) and (B-l) we can deduce 




As k-w, A-K) so that J_(0) J (0) . 

K * 
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